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Preface 


This manual contains solutions to all of the exercises in Probability and Statistics, 4th edition, by Morris 
DeGroot and myself. I have preserved most of the solutions to the exercises that existed in the 3rd edition. 
Certainly errors have been introduced, and I will post any errors brought to my attention on my web page 
http://www.stat.cmu.edu/ mark/ along with errors in the text itself. Feel free to send me comments. 

For instructors who are familiar with earlier editions, I hope that you will find the 4th edition at least as 
useful. Some new material has been added, and little has been removed. Assuming that you will be spending 
the same amount of time using the text as before, something will have to be skipped. I have tried to arrange 
the material so that instructors can choose what to cover and what not to cover based on the type of course 
they want. This manual contains commentary on specific sections right before the solutions for those sections. 
This commentary is intended to explain special features of those sections and help instructors decide which 
parts they want to require of their students. Special attention is given to more challenging material and how 
the remainder of the text does or does not depend upon it. 

To teach a mathematical statistics course for students with a strong calculus background, one could safely 
cover all of the material for which one could find time. The Bayesian sections include 4.8, 7.2, 7.3, 7.4, 8.6, 
9.8, and 11.4. One can choose to skip some or all of this material if one desires, but that would be ignoring 
one of the unique features of the text. The more challenging material in Sections 7.7—7.9, and 9.29.4 is really 
only suitable for a mathematical statistics course. One should try to make time for some of the material in 
Sections 12.1—12.3 even if it meant cutting back on some of the nonparametrics and two-way ANOVA. To teach 
a more modern statistics course, one could skip Sections 7.7—7.9, 9.29.4, 10.8, and 11.7-11.8. This would 
leave time to discuss robust estimation (Section 10.7) and simulation (Chapter 12). Section 3.10 on Markov 
chains is not actually necessary even if one wishes to introduce Markov chain Monte Carlo (Section 12.5), 
although it is helpful for understanding what this topic is about. 


Using Statistical Software 


The text was written without reference to any particular statistical or mathematical software. However, 
there are several places throughout the text where references are made to what general statistical software 
might be able to do. This is done for at least two reasons. One is that different instructors who wish to use 
statistical software while teaching will generally choose different programs. I didn’t want the text to be tied 
to a particular program to the exclusion of others. A second reason is that there are still many instructors 
of mathematical probability and statistics courses who prefer not to use any software at. all. 

Given how pervasive computing is becoming in the use of statistics, the second reason above is becoming 
less compelling. Given the free and multiplatform availability and the versatility of the environment R, even 
the first reason is becoming less compelling. Throughout this manual, I have inserted pointers to which R 
functions will perform many of the calculations that would formerly have been done by hand when using this 
text. The software can be downloaded for Unix, Windows, or Mac OS from 
http://www.r-project.org/ 

That site also has manuals for installation and use. Help is also available directly from within the R envi- 
ronment. 

Many tutorials for getting started with R are available online. At the official R site there is the detailed 
manual: http: //cran.r-project.org/doc/manuals/R-intro.html 
that starts simple and has a good table of contents and lots of examples. However, reading it from start to 
finish is not an efficient way to get started. The sample sessions should be most helpful. 

One major issue with using an environment like R is that it is essentially programming. That is, students 
who have never programmed seriously before are going to have a steep learning curve. Without going into 
the philosophy of whether students should learn statistics without programming, the field is moving in the 
direction of requiring programming skills. People who want only to understand what a statistical analysis 
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is about can still learn that without being able to program. But anyone who actually wants to do statistics 
as part of their job will be seriously handicapped without programming ability. At the end of this manual 
is a series of heavily commented R programms that illustrate many of the features of R in the context of a 
specific example from the text. 


Mark J. Schervish 
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Chapter 1 


Introduction to Probability 


1.2 Interpretations of Probability 


Commentary 


It is interesting to have the students determine some of their own subjective probabilities. For example, let 
X denote the temperature at noon tomorrow outside the building in which the class is being held. Have each 
student determine a number x, such that the student considers the following two possible outcomes to be 
equally likely: X < x; and X > 21. Also, have each student determine numbers x2 and x3 (with x2 < x3) such 
that the student considers the following three possible outcomes to be equally likely: X < 29, rg < X < 23, 
and X > x3. Determinations of more than three outcomes that are considered to be equally likely can also 
be made. The different values of 7, determined by different members of the class should be discussed, and 
also the possibility of getting the class to agree on a common value of 7}. 

Similar determinations of equally likely outcomes can be made by the students in the class for quantities 
such as the following ones which were found in the 1973 World Almanac and Book of Facts: the number 
of freight cars that were in use by American railways in 1960 (1,690,396), the number of banks in the 
United States which closed temporarily or permanently in 1931 on account of financial difficulties (2,294), 
and the total number of telephones which were in service in South America in 1971 (6,137,000). 


1.4 Set Theory 


Solutions to Exercises 


1. Assume that x € B°. We need to show that 2 € A‘. We shall show this indirectly. Assume, to the 
contrary, that x € A. Then x € B because A C B. This contradicts x € B®. Hence x € A is false and 
xe AS. 


2. First, show that AN (BUC) Cc (AN B)U(ANC). Let xe AN(BUC). Thenz € Aandxe BUC. 
That is, « € A and either x € B or x € C (or both). So either (x € A and z € B)or(t@eE A 
and z € C) or both. That is, either sr € AN Bor x € ANC. This is what it means to say that 
x € (ANB)U(ANC). Thus AN(BUC) Cc (ANB)U(ANC). Basically, running these steps backwards 
shows that (AN B) U(ANC) Cc AN(BUC). 


3. To prove the first result, let x € (AU B)*°. This means that x is not in AU B. In other words, z is 
neither in A nor in B. Hence x € A° and x € B*. Sox € ACN B®. This proves that (AU B)* c ASN BS. 
Next, suppose that x € A°N B®. Then x € A® and x € B®. So x is neither in A nor in B, so it can’t be 
in AUB. Hence x € (AU B)°. This shows that A©N B® c (AU B)*. The second result follows from 
the first by applying the first result to A® and BS and then taking complements of both sides. 
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Chapter 1. Introduction to Probability 


. Tosee that ANB and AN B® are disjoint, let r €¢ ANB. Then zx € B, hence x ¢ B° andso xz ¢ ANB®. So 


no element of ANB is in AN B®, hence the two events are disjoint. To prove that A = (ANB)U(ANB*), 
we shall show that each side is a subset of the other side. First, let « € A. Either « € B or x € B®. If 
x é€ B,thn2eé ANB. Ifxe B°, then z € AN B®. Either way, c € (AN B)U(ANB*). So every 
element of A is an element of (AN B)U(AN B°*) and we conclude that A C (AN B)U(AN B’*). Finally, 
let c € (AN B)U(ANB*). Then either « € AN B, in which case x € A, or x € AN B®, in which 
case x € A. Either way x € A, so every element of (AN B)U(AN B*) is also an element of A and 
(AN B)U(ANB) CA. 


. To prove the first result, let « € (U;A;)°. This means that x is not in U;A;. In other words, for every 


i € I, x is not in A,;. Hence for every 1 € I, x € Af. Sox € MAS. This proves that (U,A;)° C MAS. 
Next, suppose that « € M;A§. Then x € A§ for every 7 € I. So for every 7 € I, x is not in A;. Sox 
can’t be in U;A;. Hence x € (U;A;)°. This shows that 1; A$ C (U;A;)°. The second result follows from 
the first by applying the first result to A€ for i € J and then taking complements of both sides. 
(a) Blue card numbered 2 or 4. 
(b) Blue card numbered 5, 6, 7, 8, 9, or 10. 
(c) Any blue card or a red card numbered 1, 2, 3, 4, 6, 8, or 10. 
(d) 

) 


(e 


(a) These are the points not in A, hence they must be either below 1 or above 5. That is A° = {z: 
ee Ore Sa). 


Blue card numbered 2, 4, 6, 8, or 10, or red card numbered 2 or 4. 


Red card numbered 5, 7, or 9. 


(b) These are the points in either A or B or both. So they must be between 1 and 5 or between 3 and 
( het 6, AUB = (eel <a 7}. 


(c) These are the points in B but not in C. That is BC° = {x:3<a< 7}. (Note that B CC®.) 
(d) These are the points in none of the three sets, namely ACB°CS = {1 :0 << a<loraz> 7}. 


(e) These are the points in the answer to part (b) and in C. There are no such values and (AUB)C = 9. 


. Blood type A reacts only with anti-A, so type A blood corresponds to AM B°. Type B blood reacts 


only with anti-B, so type B blood corresponds to A°B. Type AB blood reacts with both, so AN B 
characterizes type AB blood. Finally, type O reacts with neither antigen, so type O blood corresponds 
to the event A°B°. 


(a) For each n, By, = Bn4i1U An, hence B, D By+1 for all n. For each n, Cn4iM An = Ch, so 
CnC Cn41- 


(b) Suppose that x € N?2,B,. Then x € By, for all n. That is, x € UP2,,A; for all n. For n = 1, there 
exists i > n such that « € A;. Assume to the contrary that there are at most finitely many 7 such 
that x € A;. Let m be the largest such 7. For n = m-+1, we know that there is 7 > n such that 
x € A;. This contradicts m being the largest i such that x € A;. Hence, = is in infinitely many 
A;. For the other direction, assume that x is in infinitely many A;. Then, for every n, there is a 


value of 7 > such that x € A;, hence x € Uf, A; = By, for every n and x € NP, Bn. 


(c) Suppose that « € U2,C,. That is, there exists n such that  € C, = NZ,,Ai, so x € A; for 
all i > n. So, there at most finitely many i (a subset of 1,...,n—1) such that x ¢ Aj. Finally, 
suppose that x € A; for all but finitely many 7. Let k be the last i such that x ¢ A;. Then x € A; 
for alli > k +1, hence x € MP2,,,A; = Cy41. Hence x € UR 1Cy. 
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10. (a) All three dice show even numbers if and only if all three of A, B, and C occur. So, the event is 
ANBNC. 


(b) None of the three dice show even numbers if and only if all three of A°, B®, and C® occur. So, the 
event is ASN BSN C®. 

(c) At least one die shows an odd number if and only if at least one of A°, B°, and C® occur. So, the 
event is ACU BSUC*. 

(d) At most two dice show odd numbers if and only if at least one die shows an even number, so 
the event is AU BUC. This can also be expressed as the union of the three events of the form 
ANMBNC* where exactly one die shows odd together with the three events of the form AN B°NC° 
where exactly two dice show odd together with the even AM BNC where no dice show odd. 


(ce) We can enumerate all the sums that are no greater than 5: 1+1+1,2+1+4+1,1+2+4+1,14+1+2, 
2+2+1,2+1+2, and1+2-+2. The first of these corresponds to the event A, By MC}, the 
second to Ag By C1, etc. The union of the seven such events is what is requested, namely 


(AN ByNC})U(AgN Bi NC )U(A1N BaNC))U(A1N Bi NC2)U(A2N Benj )U(A2N Bi NC2)U(AiNB2NC2). 


ll. (a 


YS 


All of the events mentioned can be determined by knowing the voltages of the two subcells. Hence 
the following set can serve as a sample space 


S=((¢,9) 0a = band 0< y= 5}. 


where the first coordinate is the voltage of the first subcell and the second coordinate is the voltage 
of the second subcell. Any more complicated set from which these two voltages can be determined 
could serve as the sample space, so long as each outcome could at least hypothetically be learned. 

(b) The power cell is functional if and only if the sum of the voltages is at least 6. Hence, A = {(,y) € 
S:a2+y > 6}. It is clear that B = {(z,y) € S: 2 = y} and C = {(z,y) € S: 2 > y}. The 
powercell is not functional if and only if the sum of the voltages is less than 6. It needs less than 
one volt to be functional if and only if the sum of the voltages is greater than 5. The intersection 
of these two is the event D = {(z,y)€ S:5<a+y <6}. The restriction “€ S” that appears 
in each of these descriptions guarantees that the set is a subset of S. One could leave off this 
restriction and add the two restrictions 0 < 7 <5 and 0 < y < 5 to each set. 


(c) The description can be worded as “the power cell is not functional, and needs at least one more 
volt to be functional, and both subcells have the same voltage.” This is the intersection of A°, D°, 
and B. That is, ASM D°N B. The part of D° in which x + y > 6 is not part of this set because of 
the intersection with A°. 


(d) We need the intersection of A® (not functional) with C° (second subcell at least as big as first) and 
with B° (subcells are not the same). In particular, C°M B® is the event that the second subcell is 
strictly higher than the first. So, the event is A° 7 BSN C*. 


1.5 The Definition of Probability 


Solutions to Exercises 
1. Define the following events: 
A = {the selected ball is red}, 


B {the selected ball is white}, 
C {the selected ball is either blue, yellow, or green}. 
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Chapter 1. Introduction to Probability 


We are asked to find Pr(C). The three events A, B, and C are disjoint and AU BUC = S. So 
1 = Pr(A) + Pr(B) + Pr(C’). We are told that Pr(A) = 1/5 and Pr(B) = 2/5. It follows that 
Pr(C) = 2/5. 


. Let B be the event that a boy is selected, and let G be the event that a girl is selected. We are told 


that BUG =S&, so'G = B°. Since Pr(B) = 0.3, it follows that Pr(G) = 0.7. 


(a) If A and B are disjoint then B Cc A‘® and BA‘ = B, so Pr(BA°) = Pr(B) = 1/2. 

(b) If Ac B, then B = AU (BA) with A and BA‘ disjoint. So Pr(B) = Pr(A) + Pr(BA*). That is, 
1/2 = 1/3+ Pr(BA*), so Pr(BA*) = 1/6. 

(c) According to Theorem 1.4.11, B = (BA) U(BA’‘). Also, BA and BA*® are disjoint so, Pr(B) = 
Pr(BA) + Pr(BA‘). That is, 1/2 =1/8+ Pr(BA‘), so Pr(BA‘) = 3/8. 


. Let Ey, be the event that student A fails and let Ey be the event that student B fails. We want 


Pr(E, U Ea). We are told that Pr(£,) = 0.5, Pr(£2) = 0.2, and Pr(£,E2) = 0.1. According to 
Theorem 1.5.7, Pr(£, U Eo) = 0.5 + 0.2 — 0.1 = 0.6. 


. Using the same notation as in Exercise 4, we now want Pr(E{M £5). According to Theorems 1.4.9 


and 1.5.3, this equals 1 — Pr(£ U E2) = 0.4. 


. Using the same notation as in Exercise 4, we now want Pr([F, 9 E5|]U [Ef E]). These two events are 


disjoint, so 
Pr([£1 9 £5] U [Ef 9 Ee]) = Pr(£1 9 £5) + Pr( £7 NO Eo). 
Use the reasoning from part (c) of Exercise 3 above to conclude that 


Pr( Fy ial ES) = Pr(E}) _ Pr( Fy ia E2) = 0.4, 
Pr( Ef MN Ep) = Pr( £2) _ Pr( Fy NY E») = 0.1. 


It follows that the probability we want is 0.5. 


. Rearranging terms in Eq. (1.5.1) of the text, we get 


Pr(AN B) = Pr(A) + Pr(B) — Pr(AU B) = 0.44 0.7 — Pr(AU B) = 1.1—Pr(AUB). 


So Pr(AN B) is largest when Pr(A U B) is smallest and vice-versa. The smallest possible value for 
Pr(AU B) occurs when one of the events is a subset of the other. In the present exercise this could only 
happen if A C B, in which case Pr(AU B) = Pr(B) = 0.7, and Pr(AN B) = 0.4. The largest possible 
value of Pr(A U B) occurs when either A and B are disjoint or when AU B = S. The former is not 
possible since the probabilities are too large, but the latter is possible. In this case Pr(AU B) = 1 and 
PrAnB)=0.1, 


. Let A be the event that a randomly selected family subscribes to the morning paper, and let B be the 


event that a randomly selected family subscribes to the afternoon paper. We are told that Pr(A) = 0.5, 
Pr(B) = 0.65, and Pr(AU B) = 0.85. We are asked to find Pr(AN B). Using Theorem 1.5.7 in the text 
we obtain 


Pr(AN B) = Pr(A) + Pr(B) — Pr(AU B) = 0.5 + 0.65 — 0.85 = 0.3. 
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9. The required probability is 


Pr(AN B®) + Pr(A°B) 


[Pr(A) — Pr(AN B)] + [Pr(B) — Pr(AN B)| 
Pr(A) + Pr(B) — 2Pr(An B). 


10. Theorem 1.4.11 says that A = (AN B)U(ANB?*). Clearly the two events AN B and AN B® are disjoint. 
It follows from Theorem 1.5.6 that Pr(A) = Pr(AN B)+ Pr(An B*). 


11. (a) The set of points for which (2 — 1/2)? + (y—1/2)? < 1/4 is the interior of a circle that is contained 
in the unit square. (Its center is (1/2, 1/2) and its radius is 1/2.) The area of this circle is 7/4, so 
the area of the remaining region (what we want) is 1 — 7/4. 


(b) We need the area of the region between the two lines y = 1/2—2 and y = 3/2—2. The remaining 
area is the union of two right triangles with base and height both equal to 1/2. Each triangle has 
area 1/8, so the region between the two lines has area 1 — 2/8 = 3/4. 


(c) We can use calculus to do this. We want the area under the curve y = 1 — x? between x = 0 and 
x =1. This equals 


3|1 


1 
[a-@)de= 2-5 
0 3 


(d) The area of a line is 0, so the probability of a line segment is 0. 


2 


3° 


x=0 


12. The events B,, Bo,... are disjoint, because the event B, contains the points in A,, the event Bz contains 
the points in Ag but not in Aj, the event B3 contains the points in Az but not in A, or Ag, etc. By 
this same reasoning, it is seen that U7_, A; = Ul_, B; and UP2, A; = UZ, B;. Therefore, 


and 
(oe) [oe) 
Pe (U A) =r (U #.) 
i=1 i=l 
However, since the events B,, Bo,... are disjoint, 
nm n 
Pr (U #) =>" Prey 
i=1 i=1 
and 


Co Co 
Pr (U #.) =) Pie): 
i=1 i=1 
13. We know from Exercise 12 that 
n n 
Pr (U As) =)" Pits). 
i=1 i=1 
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Furthermore, from the definition of the events By,...,B, it is seen that B; C A; for i = 1,...,n. 
Therefore, by Theorem 1.5.4, Pr(B;) < Pr(A;) for i=1,...,n. It now follows that 


(Of course, if the events Aj,...,A, are disjoint, there is equality in this relation.) 


For the second part, apply the first part with A; replaced by Af fori =1,...,n. We get 


n 


Pr (U.4%) < >> Pr( 49). (8.1.1) 


i=1 


Exercise 5 in Sec. 1.4 says that the left side of (S.1.1) is Pr ({() A;]°). Theorem 1.5.3 says that this last 
probability is 1 — Pr((]|A;). Hence, we can rewrite (S.1.1) as 


n 


1—Pr(()4i) < > Pr(4#). 


w=1 


Finally take one minus both sides of the above inequality (which reverses the inequality) and produces 
the desired result. 


First, note that the probability of type AB blood is 1—(0.5+0.34+0.12) = 0.04 by using Theorems 1.5.2 
and 1.5.3. 


(a) The probability of blood reacting to anti-A is the probability that the blood is either type A or 
type AB. Since these are disjoint events, the probability is the sum of the two probabilities, namely 
0.34 + 0.04 = 0.38. Similarly, the probability of reacting with anti-B is the probability of being 
either type B or type AB, 0.12 + 0.04 = 0.16. 


(b) The probability that both antigens react is the probability of type AB blood, namely 0.04. 


1.6 Finite Sample Spaces 


Solutions to Exercises 


ils 


The safe way to obtain the answer at this stage of our development is to count that 18 of the 36 
outcomes in the sample space yield an odd sum. Another way to solve the problem is to note that 
regardless of what number appears on the first die, there are three numbers on the second die that will 
yield an odd sum and three numbers that will yield an even sum. Either way the probability is 1/2. 


. The event whose probability we want is the complement of the event in Exercise 1, so the probability 
is also 1/2. 
. The only differences greater than or equal to 3 that are available are 3, 4 and 5. These large difference 


only occur for the six outcomes in the upper right and the six outcomes in the lower left of the array 
in Example 1.6.5 of the text. So the probability we want is 1 — 12/36 = 2/3. 


. Let x be the proportion of the school in grade 3 (the same as grades 2-6). Then 22 is the proportion in 


grade 1 and 1 = 2x + 5a = 7x. So x = 1/7, which is the probability that a randomly selected student 
will be in grade 3. 
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. The probability of being in an odd-numbered grade is 27 + x + x = 4x = 4/7. 


. Assume that all eight possible combinations of faces are equally likely. Only two of those combinations 


have all three faces the same, so the probability is 1/4. 


. The possible genotypes of the offspring are aa and Aa, since one parent will definitely contribute an 


a, while the other can contribute either A or a. Since the parent who is Aa contributes each possible 
allele with probability 1/2 each, the probabilities of the two possible offspring are each 1/2 as well. 
(a) The sample space contains 12 outcomes: (Head, 1), (Tail, 1), (Head, 2), (Tail, 2), ete. 


(b) Assume that all 12 outcomes are equally likely. Three of the outcomes have Head and an odd 
number, so the probability is 1/4. 


1.7 Counting Methods 


Commentary 


If you wish to stress computer evaluation of probabilities, then there are programs for computing factorials 
and log-factorials. For example, in the statistical software R, there are functions factorial and lfactorial 
that compute these. If you cover Stirling’s formula (Theorem 1.7.5), you can use these functions to illustrate 


the closeness of the approximation. 


Solutions to Exercises 


1. 


Each pair of starting day and leap year/no leap year designation determines a calendar, and each 
calendar correspond to exactly one such pair. Since there are seven days and two designations, there 
are a total of 7 x 2 = 14 different calendars. 


. There are 20 ways to choose the student from the first class, and no matter which is chosen, there are 18 


ways to choose the student from the second class. No matter which two students are chosen from the first 
two classes, there are 25 ways to choose the student from the third class. The multiplication rule can be 
applied to conclude that the total number of ways to choose the three members is 20 x 18 x 25 = 9000. 


. This is a simple matter of permutations of five distinct items, so there are 5! = 120 ways. 


. There are six different possible shirts, and no matter what shirt is picked, there are four different slacks. 


So there are 24 different combinations. 


. Let the sample space consist of all four-tuples of dice rolls. There are 64 = 1296 possible outcomes. 


The outcomes with all four rolls different consist of all of the permutations of six items taken four at a 
time. There are Ps,4 = 360 of these outcomes. So the probability we want is 360/1296 = 5/18. 


. With six rolls, there are 6° = 46656 possible outcomes. The outcomes with all different rolls are 


the permutations of six distinct items. There are 6! = 720 outcomes in the event of interest, so the 
probability is 720/46656 = 0.01543. 


. There are 20” possible outcomes in the sample space. If the 12 balls are to be thrown into different 


boxes, the first ball can be thrown into any one of the 20 boxes, the second ball can then be thrown 
into any one of the other 19 boxes, etc. Thus, there are 20-19-18---9 possible outcomes in the event. 
So the probability is 20!/[8!20'"]. 
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. There are 7° possible outcomes in the sample space. If the five passengers are to get off at different 


floors, the first passenger can get off at any one of the seven floors, the second passenger can then get 
off at any one of the other six floors, etc. Thus, the probability is 
7-6-5:-4-3 360 
7 ~ 2401" 


. There are 6! possible arrangements in which the six runners can finish the race. If the three runners 


from team A finish in the first three positions, there are 3! arrangements of these three runners among 
these three positions and there are also 3! arrangements of the three runners from team B among the 
last three positions. Therefore, there are 3! x 3! arrangements in which the runners from team A 
finish in the first three positions and the runners from team B finish in the last three positions. Thus, 
the probability is (3!3!)/6! = 1/20. 


We can imagine that the 100 balls are randomly ordered in a list, and then drawn in that order. Thus, 
the required probability in part (a), (b), or (c) of this exercise is simply the probability that the first, 
fiftieth, or last ball in the list is red. Each of these probabilities is the same Too’ because of the random 
order of the list. 


In terms of factorials, Pp, = n!/[k!(n — k)!]. Since we are assuming that n and n = k are large, we 
can use Stirling’s formula to approximate both of them. The approximation to n! is (Qn) /2nrtl/2e—n 
and the approximation to (n — k)! is (2)'/?(n — k)"-*+4/2e-"+*_ The approximation to the ratio is 
the ratio of the approximations because the ratio of each approximation to its corresponding factorial 
converges to 1. That is, 


Rtn — Bl’ Rn 2(m = Ry Beek — 


n! (Qn) V/2nrtl/2e-n ew kink (1 oo 
a : 


Further simplification is available if one assumes that k is small compared to n, that is k/n = 0. In this 
case, the last factor is approximately e*, and the whole approximation simplifies to n*/k!. This makes 
sense because, if n/(n — k) is essentially 1, then the product of the k largest factors in n! is essentially 


n®. 


1.8 Combinatorial Methods 


Commentary 


This section ends with an extended example called “The Tennis Tournament”. This is an application of 
combinatorics that uses a slightly subtle line of reasoning. 


Solutions to Exercises 


ie 


2. 


We have to assign 10 houses to one pollster, and the other pollster will get to canvas the other 10 
houses. Hence, the number of assignments is the number of combinations of 20 items taken 10 at a 


time, 
20 
= 184,756. 


93 93 93 
Th io of t is 31 ih is | . 
e ratio o a ) is 31/63 < 1, so (3) is larger 
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. Since 93 = 63 + 30, the two numbers are the same. 


. Let the sample space consist of all subsets (not ordered tuples) of the 24 bulbs in the box. There are 
24 
4 

want is 1/10626. 


= 10626 such subsets. There is only one subset that has all four defectives, so the probability we 


; 4251! 
. The number is O7i4i54)) = ( 


4251 


97 ) an integer. 


n 
. There are () possible pairs of seats that A and B can occupy. Of these pairs, n — 1 pairs comprise 


vay — 
two adjacent seats. Therefore, the probability is 0 —— 


. There are (:) possible sets of k seats to be occupied, and they are all equally likely. There aren—k+1 


sets of k adjacent seats, so the probability we want is 


n—k+1 _ (w—k+ 1h! 


co n! 


. There are @ possible sets of k seats to be occupied, and they are all equally likely. Because the circle 


k 


has no start or end, there are n sets of k adjacent seats, so the probability we want is 


n  (n—k)I&! 


re (n—1)! ° 
() 


. This problem is slightly tricky. The total number of ways of choosing the n seats that will be occupied 
2 

by the n people is a Offhand, it would seem that there are only two ways of choosing these seats 
n 


so that no two adjacent seats are occupied, namely: 
XOX0...0 and OXOX...0X 

Upon further consideration, however, n — 1 more ways can be found, namely: 
XOOXOX...0X, XOXOOXOX...0X, etc. 


Therefore, the total number of ways of choosing the seats so that no two adjacent seats are occupied is 
n+ 1. The probability is (n + 1)/(7"). 
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10. We shall let the sample space consist of all subsets (unordered) of 10 out of the 24 light bulbs in the 


24 
box. There are & such subsets. The number of subsets that contain the two defective bulbs is the 


22 
number of subsets of size 8 out of the other 22 bulbs, ( 3 ) so the probability we want is 


() 
8 10x 9 
a2 =H i63p: 


24\ 24x 23 
10 


11. This exercise is similar to Exercise 10. Let the sample space consist of all subsets (unordered) of 12 out 


100 
of the 100 people in the group. There are ( 19 such subsets. The number of subsets that contain A 


98 
and B is the number of subsets of size 10 out of the other 98 people, (i) , so the probability we want 


is 


98 
10 12x 11 
= —— = 0.01333. 
100 100 x 99 
12 
35 ee ; : : 
12. There are 10 ways of dividing the group into the two teams. As in Exercise 11, the number of ways 


33 
of choosing the 10 players for the first team so as to include both A and B is & ) The number of 
ways of choosing the 10 players for this team so as not to include either A or B (A and B will then be 


oa 
together on the other team) is Ea The probability we want is then 


(") ‘ 8) 
8 10 1 2 24 
ATA NY 10 x 9+ 25 x 24 = 0.5798. 


39 30 X 34 
10 


13. This exercise is similar to Exercise 12. Here, we want four designated bulbs to be in the same group. 
The probability is 


bn) 
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n n n! n! 
(+ (.",) = Ko bike ey 


n! 1 1 
~ eae (e+e) 
n! n+1 
(kK—D!\(n—k! k(n—k+1) 


_ (n+1)! | (n+l 
~ E(n—k+D! \ kek | 


(a) If we express 2” as (1+ 1)" and expand (1+ 1)” by the binomial theorem, we obtain the desired 
result. 


(b) If we express 0 as (1 — 1)” and expand (1 — 1)” by the binomial theorem, we obtain the desired 
result. 


(a) It is easier to calculate first the probability that the committee will not contain either of the two 


98 100 
senators from the designated state. This probability is ( P j ( 3 . Thus, the final answer is 


8 
= 1 — .08546 = 0.1543. 
100 
8 


100 
(b) There are ( 50 combinations that might be chosen. If the group is to contain one senator from 


each state, then there are two possible choices for each of the fifty states. Hence, the number of 
possible combinations containing one senator from each state is 2°°. 


Call the four players A, B, C, and D. The number of ways of choosing the positions in the deck that 


5 
a} Since player A will receive 13 cards, the number of ways 
of choosing the positions in the deck for the four aces so that all of them will be received by player 


will be occupied by the four aces is 


13 
A is a} Similarly, since player B will receive 13 other cards, the number of ways of choosing the 


on ; . . (13 ed ; 
positions for the four aces so that all of them will be received by player B is ( 4 ) A similar result is 
true for each of the other players. Therefore, the total number of ways of choosing the positions in the 


13 
deck for the four aces so that all of them will be received by the same player is i( 4 ) Thus, the final 


ee 13 52 
probability is i( 4 )i( 4 ) 


100 20 
There are ( 10 ways of choosing ten mathematics students. There are (: ways of choosing two 
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19. 


20. 
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5 
20 
students from a given class of 20 students. Therefore, there are ( : ways of choosing two students 


5 

from each of the five classes. So, the final answer is (?') i (in) =~ 0.0143. 

From the description of what counts as a collection of customer choices, we see that each collection 
consists of a tuple (m1,..., Mn), where m; is the number of customers who choose item i for? = 1,...,n. 
Each m; must be between 0 and k and m, +---+m,y =k. Each such tuple is equivalent to a sequence 
of n+k—1 0’s and 1’s as follows. The first m, terms are 0 followed by a 1. The next m2 terms are 0 
followed by a 1, and so on up to mp _1 0’s followed by a 1 and finally m,, 0’s. Since m, +---+my =k 
and since we are putting exactly n — 1 1’s into the sequence, each such sequence has exactly n +k —1 
terms. Also, it is clear that each such sequence corresponds to exactly one tuple of customer choices. 
The numbers of 0’s between successive 1’s give the numbers of customers who choose that item, and 
the 1’s indicate where we switch from one item to the next. So, the number of combinations of choices 
n+k—-1 

k 


is the number of such sequences: 


We shall use induction. For n = 1, we must prove that 


Since the right side of this equation is x + y, the theorem is true for n = 1. Now assume that the 


theorem is true for each n = 1,...,n9 for no > 1. For n = no +1, the theorem says 
no+l 
1 ' 
(wt yeti = So (POPE | gkynot—k, (8.1.2) 
k=0 k 


Since we have assumed that the theorem is true for n = no, we know that 


(a+y)" = . (":) gy. (S.1.3) 


k=0 


We shall multiply both sides of (S.1.3) by «+ y. We then need to prove that x + y times the right side 
of (S.1.3) equals the right side of (S.1.2). 


Manor” = Gens ("") kyronk 


k=0 


Bi ghtlyno— “Et Ngee 


Eyer Ear 
ref (ee) Geen 


Now, apply the result in Exercise 14 to conclude that 


(1) (i) = Ce") 
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This makes the final summation above equal to the right side of (S.1.2). 


21. We are asked for the number of unordered samples with replacement, as constructed in Exercise 19. 
Here, n = 365, so there are Ce) different unordered sets of k birthdays chosen with replacement 


from 1,..., 365. 


22. The approximation to n! is (27)'/2n"+1/2e-”, and the approximation to (n/2)! is (27) '/2(n/2)"/?2+V/2e-"/?, 
Then 


mo (27) '/2nr41/2e—n = (Qn) M2gnt1 V2, 


(W/E ~ [On PnP 
With n = 500, the approximation is e?4*?4, too large to represent on a calculator with only two-digit 
exponents. The actual number is about 1/20 of 1% larger. 


1.9 Multinomial Coefficients 


Commentary 


Multinomial coefficients are useful as a counting method, and they are needed for the definition of the 
multinomial distribution in Sec. 5.9. They are not used much elsewhere in the text. Although this section 
does not have an asterisk, it could be skipped (together with Sec. 5.9) if one were not interested in the 
multinomial distribution or the types of counting arguments that rely on multinomial coefficients. 


Solutions to Exercises 


1. We have three types of elements that need to be assigned to 21 houses so that exactly seven of each 
type are assigned. The number of ways to do this is the multinomial coefficient 


21 | = 399,072,960 
in i 


2. We are asked for the number of arrangements of four distinct types of objects with 18 or one type, 12 
50 


of the next, 8 of the next and 12 of the last. This is the multinomial coefficient 18.12.8.12} 


3. We need to divide the 300 members of the organization into three subsets: the 5 in one committee, the 


300 
8 in the second committee, and the 287 in neither committee. There are (, 3 7” ways to do this. 


10 


4. Th 
ere are cari 


arrangements of the 10 letters of four distinct types. All of them are equally 


10 


= 1/50400. 
peeaa poe 


likely, and only one spells statistics. So, the probability is 1/ ( 


n 
5. There are many ways to arrange nj; j’s (for j = 1,...,6) among the n rolls. The 
M1, 2,73, 4,5, N6 
1 n 
number of possible equally likely rolls is 6”. So, the probability we want is — : 
oF 1,102,703, 14,5, N6 
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. There are 6’ possible outcomes for the seven dice. If each of the six numbers is to appear at least once 


among the seven dice, then one number must appear twice and each of the other five numbers must 

appear once. Suppose first that the number 1 appears twice and each of the other numbers appears 

once. The number of outcomes of this type in the sample space is equal to the number of different 
! 


arrangements of the symbols 1, 1, 2, 3, 4, 5, 6, which is There is an equal number of 


Ane 2" 
outcomes for each of the other five numbers which might appear twice among the seven dice. Therefore, 


6(7!) 
2 


the total number of outcomes in which each number appears at least once is , and the probability 


of this event is 


(7!) 7! 


(2)67 ~ 2(68) 


25 12 
. There are ( ways of distributing the 25 cards to the three players. There are ways 


10, 8,7 fo 
of distributing the 12 red cards to the players so that each receives the designated number of red 


cards. There are then ways of distributing the other 13 cards to the players, so that each 


13 
4,6,3 
receives the designated total number of cards. The product of these last two numbers of ways is, 
therefore, the number of ways of distributing the 25 cards to the players so that each receives the 


designated number of red cards and the designated total number of cards. So, the final probability is 
12 13 / 25 
6,2,4/ \4,6,3 10,8,7)° 


52 12 

13,13, 13, a ways of distributing the cards to the four players. There are eee 

ways of distributing the 12 picture cards so that each player gets three. No matter which of these ways 
40 

we choose, there are 10.10.10. 10 ways to distribute the remaining 40 nonpicture cards so that each 

player gets 10. So, the probability we need is 


12 40 12! 40! 
3,3, 3,3/ \ 10,10, 10, 10 (34 Gon? 


= 0.0324. 
52 
13, 13. 13,13 (13!)* 
52 eae ss 
. There are e 13.13 . ways of distributing the cards to the four players. Call these four players A, 


B, C, and D. There is only one way of distributing the cards so that player A receives all red cards, 
player B receives all yellow cards, player C receives all blue cards, and player D receives all green cards. 
However, there are 4! ways of assigning the four colors to the four players and therefore there are 4! 
ways of distributing the cards so that each player receives 13 cards of the same color. So, the probability 
we need is 


4! _ 41(13!)4 
52 ~~ §pt 
13,13, 13.13 
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10. If we do not distinguish among boys with the same last name, then there are ( 


11. 


Section 1.9. Multinomial Coefficients 15 


2 ossible arrange 
23.4) © . 


ments of the nine boys. We are interested in the probability of a particular one of these arrangements. 
So, the probability we need is 


123i! 


9 9! 
2,3,4 


We shall use induction. Since we have already proven the binomial theorem, we know that the conclusion 
to the multinomial theorem is true for every n if k = 2. We shall use induction again, but this time 
using k instead of n. For k = 2, we already know the result is true. Suppose that the result is true for 
all k < ko and for all n. For k = kg +1 and arbitrary n we must show that 


7.937 x 1074. 


n Nkgt1 
Ly st ae Lie i= eves gO S.1.4 
( o+ ) > (., a “nca) 1 ko+1 ? ( ) 
where the summation is over all nj,...,Mx 41 such that nj +--+: + Ng41 = n. Let ys = 2; for 
t=1,...,k9 — 1 and let yj, = ky + Lkg41- We then have 


(ti be s< +o pg41)” = Gr te + YE)”: 


Since we have assumed that the theorem is true for k = kg, we know that 


nm m 
(yi +--+ + Yk)” = ys Con. a smn) a Yeo» (S.1.5) 


where the summation is over all m,,..., mx, such that m;+---+m,, =n. On the right side of (5.1.5), 
substitute 2%, + Xp 41 for yz, and apply the binomial theorem to obtain 


7 ee ed ee (8.1.6) 
Wiig «doyle 1 ko-1 ‘ 4 ko ko+1 - sakes 


In (8.1.6), let ng = m,; fori = 1,..., ko —1, let ny, = 1, and let nz,41 = mz, —7. Then, in the summation 
in ($.1.6), ny +--+ +e 41 = 7 if and only ifm, +---+m,z, =n. Also, note that 


n Mk \ n 
M1, +++5Mko a N15 ++ +5 Ukg+1 


So, (S.1.6) becomes 


nr ny Nko+1 
y Ly pia’ Lot ‘ 
M1, ++ +5 Mko+1 


where this last sum is over all 1,..., 941 Such that ny +--+ + Neo41 = 71. 
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12. For each element s’ of S’, the elements of S that lead to boxful s’ are all the different sequences of 
elements of s’. That is, think of each s’ as an unordered set of 12 numbers chosen with replacement 
from 1 to 7. For example, {1,1,2,3,3,3,5,6,7,7,7,7} is one such set. The following are some of 
the elements of S lead to the same set s’: (1,1,2,3,3,3,5,6,7,7,7,7), (1,2,3,5,6, 7, 1,3, 7, 3,7, 7), 
(7,1, 7,2,3,5,7,1,6,3, 7,3). This problem is pretty much the same as that which leads to the definition 
of multinomial coefficients. We are looking for the number of orderings of 12 digits chosen from the 
numbers | to 7 that have two of 1, one of 2, three of 3, none of 4, one of 5, one of 6, and four of 7. This 
is just Gis aa) For a general s’, for i= 1,...,7, let n;(s’) be the number of i’s in the box s’. Then 
nyi(s’) + +++ +7(s’) = 12, and the number of orderings of these numbers is 


The multinomial theorem tells us that 


12 
5 ty S _ 12 
Ne Cons ng) Ped” i , 


All s’ 


where the sum is over all possible combinations of nonnegative integers n1,...,7 that add to 12. This 
matches the number of outcomes in S. 


1.10 The Probability of a Union of Events 


Commentary 


This section ends with an example of the matching problem. This is an application of the formula for the 
probability of a union of an arbitrary number of events. It requires a long line of argument and contains an 
interesting limiting result. The example will be interesting to students with good mathematics backgrounds, 
but it might be too challenging for students who have struggled to master combinatorics. One can use 
statistical software, such as R, to help illustrate how close the approximation is. The formula (1.10.10) can 
be computed as 

ints=1:n 

sum(exp(-lfactorial (ints) )*(-1)* (ints+1)), 

where n has previously been assigned the value of n for which one wishes to compute pn. 


Solutions to Exercises 


1. Let A; be the event that person 7 receives exactly two aces for i = 1,2,3. We want Pr(U3_,A;). We 
shall apply Theorem 1.10.1 directly. Let the sample space consist of all permutations of the 52 cards 
where the first five cards are dealt to person 1, the second five to person 2, and the third five to person 
3. A permutation of 52 cards that leads to the occurrence of event A; can be constructed as follows. 
First, choose which of person 7’s five locations will receive the two aces. There are C52 ways to do 
this. Next, for each such choice, choose the two aces that will go in these locations, distinguishing the 
order in which they are placed. There are P42 ways to do this. Next, for each of the preceding choices, 
choose the locations for the other two aces from among the 47 locations that are not dealt to person 7, 
distinguishing order. There are P47,2 ways to do this. Finally, for each of the preceding choices, choose 
a permutation of the remaining 48 cards among the remaining 48 locations. There are 48! ways to do 
this. Since there are 52! equally likely permutations in the sample space, we have 


_ C52P12Pi7248! 514147148! 


Pr(Ai) 5a! ~ 21312145152! 


~ 0.0399. 
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Careful examination of the expression for Pr(A;) reveals that it can also be expressed as 


4\ (48 
Pr(A,) 2) \ 3 
r(A;) = -~—.—. 
‘ 52 
5 
This expression corresponds to a different, but equally correct, way of describing the sample space in 


terms of equally likely outcomes. In particular, the sample space would consist of the different possible 
five-card sets that person 7 could receive without regard to order. 


Next, compute Pr(A;A;) for i 4 j. There are still C52 ways to choose the locations for person i’s aces 
amongst the five cards and for each such choice, there are P42 ways to choose the two aces in order. 
For each of the preceding choices, there are C’5,2 ways to choose the locations for person j’s aces and 2 
ways to order the remaining two aces amongst the two locations. For each combination of the preceding 
choices, there are 48! ways to arrange the remaining 48 cards in the 48 unassigned locations. Then, 
Pre AA) is 


2C3 P1248! 2(5!)24148! 


= 3.694 x 1074. 
Bal (2!)3(31)252! . 


Pr A;A;)} _ 


Once again, we can rewrite the expression for Pr(A;A;) as 


4\ ( 48 
2)\3,3, 42 

Pr(A;A;) = a 
é a 


This corresponds to treating the sample space as the set of all pairs of five-card subsets. 


Next, notice that it is impossible for all three players to receive two aces, so Pr(A; A2A3) = 0. Applying 
Theorem 1.10.1, we obtain 


Pr (Uj; Ai) = 3 x 0.0399 — 3 x 3.694 x 10-4 = 0.1186. 


. Let A, B, and C stand for the events that a randomly selected family subscribes to the newspaper 
with the same name. Then Pr(A U BUC) is the proportion of families that subscribe to at least one 
newspaper. According to Theorem 1.10.1, we can express this probability as 


Pr(A) + Pr(B) + Pr(C) — Pr(An B) — Pr(AC) — Pr(BC) + Pr(An BC). 


The probabilities in this expression are the proportions of families that subscribe to the various com- 
binations. These proportions are all stated in the exercise, so the formula yields 


Pr(AU BUC) =0.64+0.4+ 0.3 — 0.2 — 0.1 — 0.2 + 0.05 = 0.85. 


. As seen from Fig. S$.1.1, the required percentage is P, + P) + P3. From the given values, we have, in 
percentages, 
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Ke (es 


Figure $.1.1: Figure for Exercise 3 of Sec. 1.10. 


P, =5, 

P,=20- PB =15, 

Ps =20-— P, =15, 

Ps =10-P,=5 

P, =60— PB — Ps — Py = 35, 
Py = 40 — Py — Ps — Pp =5, 
P; =30— Ps — Pa —Pr=5. 


Therefore, P, + P2 + P3 = 45. 


. This is a case of the matching problem with n = 3. We are asked to find ps. By Eq. (1.10.10) in the 


text, this equals 


p3 >= 5°63 


. Determine first the probability that at least one guest will receive the proper hat. This probability is 


the value p, specified in the matching problem, with n = 4, namely 


bb 
pa=l-st 


i. 4 
2 6 24 8 


So, the probability that no guest receives the proper hat is 1 — 5/8 = 3/8. 


. Let A, denote the event that no red balls are selected, let Ag denote the event that no white balls 


are selected, and let A3 denote the event that no blue balls are selected. The desired probability is 
Pr(A; U Ag U A3) and we shall apply Theorem 1.10.1. The event Aj will occur if and only if the ten 
selected balls are either white or blue. Since there are 60 white and blue balls, out of a total of 90 balls, 


we have Pr(A;) = Ge Similarly, Pr(A2) and Pr(Ag3) have the same value. The event A; Ae 


30 90 


nue 


Pr(A2A3) and Pr(A;A3) have the same value. Finally, the event A;A2A3 will occur if and only if all 
three colors are missing, which is obviously impossible. Therefore, Pr(A;A2A3) = 0. When these values 


will occur if and only if all ten selected balls are blue. Therefore, Pr(A;A2) = . Similarly, 
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are substituted into Eq. (1.10.1), we obtain the desired probability, 


(i) (i) 
10 10 
Pr(Ay U Ag U Ag) = 33—+ -— 33S 


(0) (%) 


. Let A; denote the event that no student from the freshman class is selected, and let Ag, A3, and 


A, denote the corresponding events for the sophomore, junior, and senior classes, respectively. The 
probability that at least one student will be selected from each of the four classes is equal to 1—Pr(A,U 
Ag U A3 U Ay). We shall evaluate Pr(A; U Ag U A3 U Ay) by applying Theorem 1.10.2. The event Aj 
will occur if and only if the 15 selected students are sophomores, juniors, or seniors. Since there are 


90 100 
) i( iB The values of Pr(A;) 


for i = 2,3,4 can be obtained in a similar fashion. Next, the event A; Age will occur if and only if the 
15 selected students are juniors or seniors. Since there are a total of 70 juniors and seniors, we have 


70\ (100 
Pr(AiA2) = is)! (is 


obtained in this way. Next the event A; A 2A3 will occur if and only if all 15 selected students are seniors. 


40 100 
Therefore, Pr(A;A2A3) = (":) i( 15 . The probabilities of the events A; A2A4 and A;A3A4 can also 


be obtained in this way. It should be noted, however, that Pr(AzA3A4) = 0 since it is impossible that 
all 15 selected students will be freshmen. Finally, the event A ;A9A3Ay4 is also obviously impossible, so 
Pr(A;A2A3A4) = 0. So, the probability we want is 


(i) ts) | (i) ts) 
(is) (ts) (is) Gs) 
() (© @ @ @ GGG 


2) C8) YC) CY GY 


90 such students out of a total of 100 students, we have Pr(A;) = 


. The probability of each of the six events of the form A;A; for i < 7 can be 


. It is impossible to place exactly n—1 letters in the correct envelopes, because if n — 1 letters are placed 


correctly, then the nth letter must also be placed correctly. 


. Let pp = 1— qn. As discussed in the text, pio < p300 < 0.63212 < ps3 < poi. Since py is smallest for 


n = 10, then q, is largest for n = 10. 


There is exactly one outcome in which only letter 1 is placed in the correct envelope, namely the 
outcome in which letter 1 is correctly placed, letter 2 is placed in envelope 3, and letter 3 is placed in 
envelope 2. Similarly there is exactly one outcome in which only letter 2 is placed correctly, and one 
in which only letter 3 is placed correctly. Hence, of the 3! = 6 possible outcomes, 3 outcomes yield the 
result that exactly one letter is placed correctly. So, the probability is 3/6 = 1/2. 


Consider choosing 5 envelopes at random into which the 5 red letters will be placed. If there are exactly 
r red envelopes among the five selected envelopes (r = 0,1,...,5), then exactly x = 2r envelopes will 
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contain a card with a matching color. Hence, the only possible values of x are 0, 2, 4..., 10. Thus, 
for = 0,2,...,10 and r = 2/2, the desired probability is the probability that there are exactly r red 


envelopes among the five selected envelopes, which is 


i=1 


However, since Ay C Az C ... C An, it follows that Uf_, Ai = An. Hence, 


Pr (U As) = lim Pr(A,). 
i=1 


We know that 


i=l i=1 
Hence, 
(oe) [oe 
Pr( 4) =1-Pr(U 4s) 
i=l i=1 
However, since A; > Ag D..., then Af C A§ C.... Therefore, by Exercise 12, 


Pr (U 4s) = Jim, Pr( AL) = Jim [1 — Pr(A,,)] =1—- Jina, Pr{A,,): 
i=1 


It now follows that 


Pr (a As) = lim Pr(A,). 
i=l 


1.12 Supplementary Exercises 


Solutions to Exercises 


i, 


2. 


No, since both A and B might occur. 


Pr(A’n Ben DP) = Pr(AU BUD) =0.3. 
250 100 
ig} \i2 
350 , 
30 
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2 4 
4. There are es ways of choosing 10 cards from the deck. For 7 = 1,...,5, there ()) ways of choosing 


two cards with the number j. Hence, the answer is 


Co) ny) 


5. The region where total utility demand is at least 215 is shaded in Fig. $.1.2. The area of the shaded 


Electric 


(200,15) |" 


200 


Figure $.1.2: Region where total utility demand is at least 215 in Exercise 5 of Sec. 1.12. 
region is 
1 
5 x 1385 x 1385 = 9112.5 
The probability is then 9112.5/29204 = 0.3120. 


6. (a) There are ( = ") possible positions that the red balls could occupy in the ordering as they are 
r 


drawn. Therefore, the probability that they will be in the first r positions is 1/ ( a "). 
r 


1 
(b) There are a ways that the red balls can occupy the first r + 1 positions in the ordering. 
r 


Therefore, the probability is ( 7 ‘ f ( as ") =(¢+ 1)/ ( - "). 
r r r 


7. The presence of the blue balls is irrelevant in this problem, since whenever a blue ball is drawn it is 
ignored. Hence, the answer is the same as in part (a) of Exercise 6. 


1 
8. There are ( ; ways of choosing the seven envelopes into which the red cards will be placed. There 
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7 3 . . 

are ( ) 73 ways of choosing exactly 7 red envelopes and 7 — 7 green envelopes. Therefore, the 
Jj i 

probability that exactly 7 red envelopes will contain red cards is 


Clone for j = 4,5,6,7. 


But if 7 red envelopes contain red cards, then j — 4 green envelopes must also contain green cards. 
Hence, this is also the probability of exactly k = j + (j — 4) = 27 — 4 matches. 


1 
. There are @ ways of choosing the five envelopes into which the red cards will be placed. There 


7 3 

are ( ) 5 ) ways of choosing exactly 7 red envelopes and 5 — 7 green envelopes. Therefore the 
J —J 

probability that exactly 7 red envelopes will contain red cards is 


() (, me for j = 2,3,4,5. 


But if 7 red envelopes contain red cards, then j — 2 green envelopes must also contain green cards. 
Hence, this is also the probability of exactly k = 7 + (7 — 2) = 27 — 2 matches. 


If there is a point x that belongs to neither A nor B, then x belongs to both A® and B°. Hence, A° 
and B° are not disjoint. Therefore, A® and B° will be disjoint if and only if AUB=S. 


We can use Fig. $.1.1 by relabeling the events A, B, and C in the figure as A;, Ag, and A3 respectively. 
It is now easy to see that the probability that exactly one of the three events occurs is py + po + p3. 
Also, 


Pr(Aj) = pi +pat+pet pz, 
Pr(A1 M Ag) pat pr, etc. 


By breaking down each probability in the given expression in this way, we obtain the desired result. 


The proof can be done in a manner similar to that of Theorem 1.10.2. Here is an alternative argument. 
Consider first a point that belongs to exactly one of the events A;,...,A,. Then this point will be 
counted in exactly one of the Pr(A;) terms in the given expression, and in none of the intersections. 
Hence, it will be counted exactly once in the given expression, as required. Now consider a point that 
belongs to exactly r of the events Aj,...,A,(r > 2). Then it will be counted in exactly r of the Pr(A;) 


terms, exactly ) of the Pr(A;A;) terms, exactly 4 of the Pr(A;A;A,) terms, etc. Hence, in the 


given expression it will be counted the following number of times: 


= AG) +26) 
Uo) (Ge 


by Exercise b of Sec. 1.8. Hence, a point will be counted in the given expression if and only if it belongs 
to exactly one of the events A;,...,A,, and then it will be counted exactly once. 
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In order for the winning combination to have no consecutive numbers, between every pair of 
numbers in the winning combination there must be at least one number not in the winning com- 
bination. That is, there must be at least k — 1 numbers not in the winning combination to be 
in between the pairs of numbers in the winning combination. Since there are k numbers in the 
winning combination, there must be at least k + & — 1 = 2k —1 numbers available in order for it 
to be possible to have no consecutive numbers in the winning combination. So, n must be at least 
2k — 1 to allow consecutive numbers. 


Let 71,...,7, and j1,...,j3, be as described in the problem. For one direction, suppose that 
11,-.--,%~ contains at least one pair of consecutive integers, say tq41 = %q¢ +1. Then 

Jat = lati —@=tgt1l—-—a=ig —(a—1) = ja. 
So, J1,---,Je Contains repeats. For the other direction, suppose that 7),...,j7, contains repeats, 
Say Ja+1 = Ja. Then 

atl = jJati t@=jata=ig +1. 
So 71,...,%% contains a pair of consecutive numbers. 


Since iy < ig <-+-+ < ip, we know that ig +1 < ig41, so that jg =ig —at1 < ta41 —@ = Jay for 
each a= 1,...,k —1. Since i, <n, jp = ip —kK4+1<n—k+1. The set of all (j1,...,3,) with 
1L< jp <-++ < gy <n—k +1 is just the number of combinations of n — k +1 items taken k at a 


—k+1 
time, that is (" T ) 


k 
By part (b), there are no pairs of consecutive integers in the winning combination (i1,...,i,) if 
n 
and only if (j1,...,J¢) has no repeats. The total number of winning combinations is a In part 
(c), we computed the number of winning combinations with no repeats among (j1,...,jx~) to be 


—k+1 
(" k a ) So, the probability of no consecutive integers is 


n-k+1 
k _ (n—k)\(n—-—k+D)! 


n — al(n—2k +1)! * 
() 


The probability of at least one pair of consecutive integers is one minus the answer to part (d). 
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Chapter 2 


Conditional Probability 


2.1 The Definition of Conditional Probability 


Commentary 


It is useful to stress the point raised in the note on page 59. That is, conditional probabilities behave just 
like probabilities. This will come up again in Sec. 3.6 where conditional distributions are introduced. 

This section ends with an extended example called “The Game of Craps”. This example helps to reinforce 
a subtle line of reasoning about conditional probability that was introduced in Example 2.1.5. In particular, 
it uses the idea that conditional probabilities given an event B can be calculated as if we knew ahead of time 
that B had to occur. 


Solutions to Exercises 
1. If AC B, then AN B=A and Pr(AN B) = Pr(A). So Pr(A|B) = Pr(A)/ Pr(B). 
2. Since AN B = 9, it follows that Pr(AM B) = 0. Therefore, Pr(A | B) = 0. 
3. Since AN S = A and Pr(S) = 1, it follows that Pr(A |S) = Pr(A). 


4. Let A; stand for the event that the shopper purchases brand A on his ith purchase, for i = 1,2,.... 
Similarly, let B; be the event that he purchases brand B on the ith purchase. Then 


Pr(A;) 


WlhrRwlmwlenwlre 


Pr(Ag | A) 


Pr(B3 | Ain Ag) 


Pri Bg | A, Ag Bs) = 


The desired probability is the product of these four probabilities, namely 1/27. 


5. Let R; be the event that a red ball is drawn on the ith draw, and let B; be the event that a blue ball 
is drawn on the 7th draw fori =1,...,4. Then 


r 


Pri) = raw 
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r+k 
Ea ~ Saba 
r+2k 
Pee EGS = ae oe 
b 
Pr( Bg | Ry Ro M Rs) = Pep Loe 


The desired probability is the product of these four probabilities, namely 


r(r+k)(r + 2k)b 
(r+ b)(r+b+k)(r+b4 2k)(r +b + 3k) 


. This problem illustrates the importance of relying on the rules of conditional probability rather than 


on intuition to obtain the answer. Intuitively, but incorrectly, it might seem that since the observed 
side is green, and since the other side might be either red or green, the probability that it will be 
green is 1/2. The correct analysis is as follows: Let A be the event that the selected card is green on 
both sides, and let B be the event that the observed side is green. Since each of the three cards is 
equally likely to be selected, Pr(A) = Pr(AN B) = 1/3. Also, Pr(B) = 1/2. The desired probability is 


1 1 2 
Pr(A| B) = [= -~)=-. 
(41B)=(3)/(5)=5 
0.2 1 
. We know that Pr(A) = 0.6 and Pr(An B) = 0.2. Therefore, Pr(B | A) = 56° 3 


. In Exercise 2 in Sec. 1.10 it was found that Pr(AU BUC) = 0.85. Since Pr(A) = 0.6, it follows that 


0.60 12 
0.85 17° 


(a) If card A has been selected, each of the other four cards is equally likely to be the other selected 
card. Since three of these four cards are red, the required probability is 3/4. 


Pr(A| AUBUC) = 


(b) We know, without being told, that at least one red card must be selected, so this information does 
not affect the probabilities of any events. We have 
4 3 8 
Pr(both cards red) = Pr(R1) Pr(Re | Ri) = aria 
As in the text, let 79 stand for the probability that the sum on the first roll is either 7 or 11, and let 
a; be the probability that the sum on the first roll is 7 for ¢ = 2,...,12. In this version of the game of 
craps, we have 


2 
™mO = 9° 
3 
_ _3 36 _1i 
th = Mio a ae 
36 «—3386— 36 
4 
ee 36 _1 
™ = T= 35° G—§—D = 97 
36 «633636 
9) 
_ =O 36 _ 25 
me TSS 36 B62 468" 
36 «633636 


The probability of winning, which is the sum of these probabilities, is 0.448. 
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This is the conditional version of Theorem 1.5.3. From the definition of conditional probability, we 
have 


Pr(a‘ja) = 
1-Pr(A|B) = oat 
_ Pre) — Pr Anz) 
= oe (8.2.1) 


According to Theorem 1.5.6 (switching the names A and B), Pr(B) — Pr(An B) = Pr(A°N B). 
Combining this with (S.2.1) yields 1 — Pr(A|B) = Pr(A‘|B). 


This is the conditional version of Theorem 1.5.7. Let Aj = AN D and Ag = BND. Then Ay U Ag = 
(AU B)ND and Ay MN Ag = AN BND. Now apply Theorem 1.5.7 to determine Pr(A, U Ag). 


Pr({A U B| ial D) = Pr(A;, U Ag) = Pr(A;) + Pr(Ag) _ Pr(A, ia Ag) = Pr(A ‘a D) 
+Pr(Bn D)—Pr(AN BND). 


Now, divide the extreme left and right ends of this string of equalities by Pr(D) to obtain 


Pr(AUB|D) = a = ee 


Pr(A|D) + Pr(B|D) — Pr(An BID). 


Let A; denote the event that the selected coin has a head on each side, let Ag denote the event that it 
has a tail on each side, let A3 denote the event that it is fair, and let B denote the event that a head 
in obtained. Then 


3 4 2 
Pr(Ai) = 9) Pr(A2)= 9, Pr(ds) = 9) 
Pr(B| 41) = 1, Pr(B| 42)=0, Pr(B| As) = 5. 
Hence, 
3 
Pr(B) =) Pr(A;) Pr(B | Aj) = -. 
i=1 


We partition the space of possibilities into three events B,, Bj, Bs as follows. Let By, be the event that 
the machine is in good working order. Let Bz be the event that the machine is wearing down. Let B3 be 
the event that it needs maintenance. We are told that Pr(B,) = 0.8 and Pr(B2) = Pr(Bs) = 0.1. Let 
A be the event that a part is defective. We are asked to find Pr(A). We are told that Pr(A|B,) = 0.02, 
Pr(A|Bo) = 0.1, and Pr(A|Bs) = 0.3. The law of total probability allows us to compute Pr(A) as 
follows 


3 
Pr(A) = S> Pr(B;) Pr(A|B;) = 0.8 x 0.02 + 0.1 x 0.1+0.1 x 0.3 = 0.056. 
j=l 
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15. The analysis is similar to that given in the previous exercise, and the probability is 0.47. 


16. In the usual notation, we have 


Pr( Bo) = Pr(Aq M Bo) + Pray M Bo) = Pr(A;) Pr( Bz | A) + Pr(B,) Pr( Bz | B,) 
_ 12,3 1_5 
a oe es 


17. Clearly, we must assume that Pr(B;C) > 0 for all j, otherwise (2.1.5) is undefined. By applying the 
definition of conditional probability to each term, the right side of (2.1.5) can be rewritten as 


ype DOP An Es NC) Le 
Pr( Pr(B; fi) ~ Pr( — 


4=1 = 


). 


According to the law of total probability, the last sum above is Pr(A MC), hence the ratio is Pr(A|C). 


2.2 Independent Events 


Commentary 


Near the end of this section, we introduce conditionally independent events. This is a prelude to conditionally 
independent and conditionally i.i.d. random variables that are introduced in Sec. 3.7. Conditional indepen- 
dence has become more popular in statistical modeling with the introduction of latent-variable models and 
expert systems. Although these models are not introduced in this text, students who will encounter them in 
the future would do well to study conditional independence early and often. 

Conditional independence is also useful for illustrating how learning data can change the distribution of 
an unknown value. The first examples of this come in Sec. 2.3 after Bayes’ theorem. The assumption that 
a sample of random variables is conditionally i.i.d. given an unknown parameter is the analog in Bayesian 
inference to the assumption that the random sample is i.i.d. marginally. Instructors who are not going to 
cover Bayesian topics might wish to bypass this material, even though it can also be useful in its own right. 
If you decide to not discuss conditional independence, then there is some material later in the book that you 
might wish to bypass as well: 


e Exercise 23 in this section. 


The discussion of conditionally independent events on pages 81—84 in Sec. 2.3. 

e Exercises 12, 14 and 15 in Sec. 2.3. 

e The discussion of conditionally independent random variables that starts on page 163. 
e Exercises 13 and 14 in Sec. 3.7. 

e Virtually all of the Bayesian material. 


This section ends with an extended example called “The Collector’s Problem”. This example combines 
methods from Chapters 1 and 2 to solve an easily stated but challenging problem. 
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Solutions to Exercises 


1. If Pr(B) < 1, then Pr(B°) = 1 — Pr(B) > 0. We then compute 


Pr(A° 1 B°) 
Pr( 8) 
1—Pr(AUB) 
1 — Pr(B) 
1 — Pr(A) — Pr(B) + Pr(An B) 
1 — Pr(B) 
1 — Pr(A) — Pr(B) + Pr(A) Pr(B) 
1— Pr(B) 
fl — Pr(A)][1 — Pr(B)) 
1 — Pr(B) 
= 1-—Pr(A) = Pr(A‘). 


Pr(A‘|B°) = 


Pr(A°B°) = Pr{(AUB)}=1—Pr(A U B) 
= 1-[Pr(A) + Pr(B) — Pr(An B)| 
= 1-— Pr(A) — Pr(B) + Pr(A) Pr(B)] 
= [1—Pr(A)]f. — Pr(B) 
= Pri A) Pr(B*), 


3. Since the event AN B is a subset of the event A, and Pr(A) = 0, it follows that Pr(ANM B) = 0. Hence, 
Pr(AN B) = 0 = Pr(A) Pr(B). 


4. The probability that the sum will be seven on any given roll of the dice is 1/6. The probability that 
this event will occur on three successive rolls is therefore (1/6)°. 


5. The probability that both systems will malfunction is (0.001)? = 10~°. The probability that at least 
one of the systems will function is therefore 1 — 10~°. 


6. The probability that the man will win the first lottery is 100/10000 = 0.01, and the probability that 
he will win the second lottery is 100/5000 = 0.02. The probability that he will win at least one lottery 
is, therefore, 


0.01 + 0.02 — (0.01)(0.02) = 0.0298. 


7. Let FE, be the event that A is in class, and let E> be the event that B is in class. Let C be the event 
that at least one of the students is in class. That is, C = Ey, U Eo. 


(a) We want Pr(C). We shall use Theorem 1.5.7 to compute the probability. Since EF, and E> are 
independent, we have Pr(£, 9 E2) = Pr(£1) Pr(£2). Hence 


Pr(C) = Pr(E,) + Pr(Ey) — Pr(E1 0 Ep) = 0.8 + 0.6 — 0.8 x 0.6 = 0.92. 
(b) We want Pr(£1|C). We computed Pr(C) = 0.92 in part (a). Since Ey C C, Pr(£i NC) = Pr(F£) = 
0.8. So, Pr(E,|C) = 0.8/0.92 = 0.8696. 
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8. The probability that all three numbers will be equal to a specified value is 1/6°. Therefore, the 
probability that all three numbers will be equal to any one of the six possible values is 6/6? = 1/36. 


9. The probability that exactly n tosses will be required on a given performance is 1/2”. Therefore, the 
probability that exactly n tosses will be required on all three performances is (1/2”)? = 1/8”. The 


a | 1 
probability that the same number of tosses will be required on all three performances is ». mo 


10. The probability p; that exactly 7 children will have blue eyes is 


w= VCO fo santens 


The desired probability is 


p3 + Pa + D5 
Pi + p2 + p3 + pa + Ps 


11. (a) We must determine the probability that at least two of the four oldest children will have blue eyes. 
The probability p; that exactly 7 of these four children will have blue eyes is 


ULORGEE 


The desired probability is therefore po + p3 + pa. 


(b) The two different types of information provided in Exercise 10 and part (a) are similar to the two 
different types of information provided in part (a) and part (b) of Exercise 9 of Sec. 2.1. 


12. (a) Pr(A°N Bone’) = Pr(A®) Pr(B") Pr(C*) = oe 


(b) The desired probability is 


' . le ed 
Pr(AN BSNC*’) + Pr(APN BNC*) + Pr(A°N BNC) =7-3°5 


13. The probability of obtaining a particular sequence of ten particles in which one particle penetrates the 
shield and nine particles do not is (0.01)(0.99)°. Since there are 10 such sequences in the sample space, 
the desired probability is 10(0.01)(0.99)?. 


14. The probability that none of the ten particles will penetrate the shield is (0.99)!°. Therefore, the 
probability that at least one particle will penetrate the shield is 1 — (0.99)!°. 


15. If n particles are emitted, the probability that at least one particle will penetrate the shield is 1—(0.99)”. 
In order for this value to be at least 0.8 we must have 


1—(0.99)" > 08 
(0.99)" < 0.2 
nlog(0.99) <_ log (0.2). 
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Since log(0.99) is negative, this final relation is equivalent to the relation 


log (0.2) 


~ 160.1. 
= 198(0.99) 


So 161 or more particles are needed. 


To determine the probability that team A will win the World Series, we shall calculate the probabilities 
that A will win in exactly four, five, six, and seven games, and then sum these probabilities. The 
probability that A will win four straight game is (1/3)4. The probability that A will win in five games 
is equal to the probability that the fourth victory of team A will occur in the fifth game. As explained 


4\ (1\4 (2 
in Example 2.2.8, this probability is (;) (5) (5). Similarly, the probabilities that A will win in six 


3 
ae # tee es ie 6) fy 25" : 
games and in seven games are 3} \3 3 and al \3 aya respectively. By summing these 


6 /. 4 i-3 
probabilities, we obtain the result .s (;) (5) (5) , which equals 379/2187. 
i=3 
A second way to solve this problem is to pretend that all seven games are going to be played, regardless 
of whether one team has won four games before the seventh game. From this point of view, of the 
seven games that are played, the team that wins the World Series might win four, five, six, or seven 
games. Therefore, the probability that team A will win the series can be determined by calculating the 
probabilities that team A will win exactly four, five, six, and seven games, and then summing these 


probabilities. In this way, we obtain the result 


= ()G) G) 


v 
It can be shown that this answer is equal to the answer that we obtained first. 


In order for the target to be hit for the first time on the third throw of boy A, all five of the following 

independent events must occur: (1) A misses on his first throw, (2) B misses on his first throw, (3) 

A misses on his second throw, (4) B misses on his second throw, (5) A hits on his third throw. The 
robability of all five events occurring is ee = = 

ee era 4 3.2 3° 12 


Let E denote the event that boy A hits the target before boy B. There are two methods of solving 
this problem. The first method is to note that the event EF can occur in two different ways: (i) If A 
hits the target on the first throw. This event occurs with probability . (ii) If both A and B miss the 
target on their first throws, and then subsequently A hits the target before B. The probability that 
A and B will both miss on their first throws is 2.2 = —. When they do miss, the conditions of the 


game become exactly the same as they were at the beginning of the game. In effect, it is as if the boys 
were starting a new game all over again, and so the probability that A will subsequently hit the target 
before B is again Pr(£). Therefore, by considering these two ways in which the event E can occur, we 
obtain the relation 


Pr(E) = ; + 5 Px) . 


2 
The solution is Pr(£) = 3° 
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The second method of solving the problem is to calculate the probabilities that the target will be hit 
for the first time on boy A’s first throw, on his second throw, on his third throw, etc., and then to sum 
these probabilities. For the target to be hit for the first time on his jth throw, both A and B must 
miss on each of their first 7 — 1 throws, and then A must hit on his next throw. The probability of this 
event is 


G) G@ @-G@ @): 


Let A; denote the event that no red balls are selected, let Ay denote the event that no white balls are 
selected, and let A3 denote the event that no blue balls are selected. We must determine the value 
of Pr(A, U Ag U A3). We shall apply Theorem 1.10.1. The event A, will occur if and only if all ten 
selected balls are white or blue. Since there is probability 0.8 that any given selected ball will be white 
or blue, we have Pr(A;) = (0.8)!. Similarly, Pr(A2) = (0.7)'° and Pr(A3) = (0.5)!°. The event A, A2 
will occur if and only if all ten selected balls are blue. Therefore Pr(A,.M Az) = (0.5)!°. Similarly, 
Pr(AgM A3) = (0.2)! and Pr(A; A3) = (0.3)!°. Finally, the event AyMA2M A3 cannot possibly occur, 
so Pr(A, MN Az M Ag) = 0. So, the desired probability is 


(0.8)? + (0:7) +5)" = 05)" —(0.2)" — (0.3) = 0: 1356. 


To prove that B,,...,B, are independent events, we must prove that for every subset of r of these 
events (r =1,...,k), we have 


Pre; MN... B;,,) = Pr(.Bz, ) a Pri By.) 
We shall simplify the notation by writing simply B;,...,B, instead of B;,,...,B;,. Hence, we must 
show that 

PreBy Cisas8,) = Pr By) Pr B, (S.2.2) 


Suppose that the relation (S.2.2) is satisfied whenever B; = A§ for m or fewer values of 7 and B; = Aj 
for the other k — m or more values of j. We shall show that (S.2.2) is also satisfied whenever B; = Aj 
for m+ 1 values of 7. Without loss of generality, we shall assume that 7 = r is one of these m+ 1 
values, so that B, = AS. It is always true that 


Pr By 0. By) = Pr By. AV Bp) PRB. Bp a8). 


Since among the events B,,...,B,—1 there are m or fewer values of 7 such that B; = Aj, it follows 
from the induction hypothesis that 


Pr(B, feof) Beat) = Pr(B;) ake Pr By i: 
Furthermore, since Bf = A,, the same induction hypothesis implies that 
Pr( By MN... B,_1B,) = Pr(B;) eRe Pr Be-4) Pr (Be) : 
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It now follows that 
PHB Tiel) Bp) = Prey) Pr epea) l= Pre, |= Pr Bi vc By). 


Thus, we have shown that if the events B,,...,B, are independent whenever there are m or fewer 
values of j such that B; = AY, then the events B,,...,B, are also independent whenever there are 
m +1 values of 7 such that B; = AS. Since B,,...,B, are obviously independent whenever there are 
zero values of 7 such that Bj = A§ (ie., whenever B; = A; for 7 = 1,...,k), the induction argument is 
complete. Therefore, the events B),...,B, are independent regardless of whether B; = A; or B; = A; 


for each value of j. 


For the “only if” direction, we need to prove that if A,,..., A, are independent then 


Pr( Aj, N---N Aj, |Az, 1-0 Ag,) = Pr(Ai. O-°-N Ain); 


14 
for all disjoint subsets {71,...,im} and {j1,...,je} of {1,...,k}. If Ay,..., Ax are independent, then 
Pr Ay 0 Ag NAR, Nees A,y) = Pri Ag +29 Ag,,) Pri Ag Ties Ag), 


hence it follows that 


Prag Mee) Ag, TiAy NMA) 


Pr( Ay f+ Ag, |Ay ee NAG) = Pr(A;, N--- A;,) 
dA de 


= Pr(Aj, N---N A;,,)- 


For the “if” direction, assume that Pr(Aj, 1 --- 9 Aj,,|Aj, °°: Aj) = Pr(AyN-+- Aj,,) for 
all disjoint subsets {71,...,im} and {j1,...,je} of {1,...,&}. We must prove that Aj,...,A, are 
independent. That is, we must prove that for every subset {s1,...,5n} of {1,...,k}, Pr(As,---NAs,,) = 
Pr(A;,)---Pr(As,,). We shall do this by induction on n. For n = 1, we have that Pr(As,) = Pr(As,) 
for each subset {s;} of {1,...,k}. Now, assume that for all n < no and for all subsets {s1,...,5,} of 


{1,...,&} it is true that Pr(As, M---MAs,,) = Pr(Ag,)---Pr(As,,). We need to prove that for every 
subset {t1,...,tnj41} of {1,...,k} 


Pr(A;, M2 Ages) a Pr( Az, ) ayaes Pr(Ag aaa): (S.2.3) 
It is clear that 


Pr(A¢, M.A Mesa) = Pr(Az, M.A Abn [Az 41) PE Ab. ig): (S.2.4) 


We have assumed that Pr(Az,---NAt,, |Atnoy1) = Pr(Ann-- Az, ) for all disjoint subsets {t1, ... , tno } 
and {tn +1} of {1,...,4}. Since the right side of this last equation is the probability of the intersection 
of only no events, then we know that 


Pr(A;, MA Atny ) = Pr( Az, ) baie Pr(Az,,, i 
Combining this with Eq. (S.2.4) implies that (5.2.3) holds. 


For the “only if” direction, we assume that A, and Ag are conditionally independent given B and we 
must prove that Pr(A2|A;M B) = Pr(A2|B). Since A; and A: are conditionally independent given B, 
Pr(A, M Ao|B) = Pr(A;|B) Pr(Ag|B). This implies that 


Pr(A, N A2|B) 


Pr(42|B) = Samy 
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Also, 


— Pr(AiN A2NB) — Pr(AiM 420 B)/Pr(B) — Pr(A1M Ao|B) 
PR eine Px(A; 7B) 7 Px(A; 7 BY] Pr(B) 7 Px(AiB) 


Hence, Pr(A2|A; NB) = Pr(A9|B). 


For the “if” direction, we assume that Pr(A2|Ai9B) = Pr(A2|B), and we must prove that A; and A» are 
conditionally independent given B. That is, we must prove that Pr(A;M A2|B) = Pr(A,|B) Pr(A2|B). 
We know that 


Pr(A, ia A2|B) Pr(B) = Pr(A2|Ay1 ‘a B) Pr(A, ia B), 


since both sides are equal to Pr(Ay M AgM B). Divide both sides of this equation by Pr(B) and use the 
assumption Pr(A2|A;B) = Pr(Ag|B) together with Pr(A; 9 B)/ Pr(B) = Pr(A;|B) to obtain 


Pr(A; ial Ap2|B) = Pr(A |B) Pr(A;|B). 


(a) Conditional on B the events A;,...,A1,; are independent with probability 0.8 each. The con- 
ditional probability that a particular collection of eight programs out of the 11 will compile is 


11 
0.880.2? = 0.001342. There are g |= 165 different such collections of eight programs out of the 


11, so the probability of exactly eight programs will compile is 165 x 0.001342 = 0.2215. 


(b) Conditional on B° the events Aj,...,A1, are independent with probability 0.4 each. The con- 
ditional probability that a particular collection of eight programs out of the 11 will compile is 


11 
0.480.6? = 0.0001416. There are a) = 165 different such collections of eight programs out of 
the 11, so the probability of exactly eight programs will compile is 165 x 0.0001416 = 0.02335. 


Let n > 1, and assume that Aj,...,A, are mutually exclusive. For the “if” direction, assume that at 
most one of the events has strictly positive probability. Then, the intersection of every collection of size 
2 or more has probability 0. Also, the product of every collection of 2 or more probabilities is 0, so the 
events satisfy Definition 2.2.2 and are mutually independent. For the “only if” direction, assume that 
the events are mutually independent. The intersection of every collection of size 2 or more is empty 
and must have probability 0. Hence the product of the probabilities of every collection of size 2 or more 
must be 0. This means that at least one factor from every product of at least 2 probabilities must itself 
be 0. Hence there can be no more than one of the probabilities greater than 0, otherwise the product 
of the two nonzero probabilities would be nonzero. 


2.3. Bayes’ Theorem 


Commentary 


This section ends with two extended discussions on how Bayes’ theorem is applied. The first involves a 
sequence of simple updates to the probability of a specific event. It illustrates how conditional independence 
allows one to use posterior probabilities after observing some events as prior probabilities before observing 
later events. This idea is subtle, but very useful in Bayesian inference. The second discussion builds upon this 
idea and illustrates the type of reasoning that can be used in real inference problems. Examples 2.3.7 and 2.3.8 
are particularly useful in this regard. They show how data can bring very different prior probabilities into 
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closer posterior agreement. Exercise 12 illustrates the effect of the size of a sample on the degree to which 
the data can reduce differences in subjective probabilities. 

Statistical software like R can be used to facilitate calculations like those that occur in the above-mentioned 
examples. For example, suppose that the 11 prior probabilities are assigned to the vector prior and that 
the data consist of s successes and f failures. Then the posterior probabilities can be computed by 
ints=1:11 
post=prior*((ints-1)/10)“s*(1-(ints-1)/10) “f 
post=post/sum (post) 


Solutions to Exercises 
k k 
1. It must be true that Ss” Pr(Bi) = land ) Pr(B; | A) = 1. However, if Pr(B, | A) < Pr(Bi) and 
i=1 i=1 
L v ; . 
Pr(B; | A) < Pr(B;) for i = 2,...,k, we would have So Pr(B; | A) < S Prs;), a contradiction. 
i=1 i=1 
Therefore, it must be true that Pr(B; | A) > Pr(B;) for at least one value of i (i = 2,...,k). 


2. It was shown in the text that Pr(A2 | B) = 0.26 < Pr(A2) = 0.3. Similarly, 


Pr(A; | B) = (0.2)(0.01) 


WHO + 0.3)(0.02) ¥ OB osy = 2% 


Since Pr(Ai) = 0.2, we have Pr(A; | B) < Pr(A;). Furthermore, 


(0.5)(0.03) 


Pr(4s | B) = ayy + (0.3)(0.02) + (050.03) 


= 0.65. 


Since Pr(A3) = 0.5, we have Pr(A3 | B) > Pr(As). 
3. Let C’ denote the event that the selected item is nondefective. Then 


(0.3)(0.98) 


Pr(42 |) = yO Gay W3098) + OHNOT 


= 0.301. 

Commentary: It should be noted that if the selected item is observed to be defective, the probability 
that the item was produced by machine Mg is decreased from the prior value of 0.3 to the posterior 
value of 0.26. However, if the selected item is observed to be nondefective, this probability changes 
very little, from a prior value of 0.3 to a posterior value of 0.301. In this example, therefore, obtaining 
a defective is more informative than obtaining a nondefective, but it is much more probable that a 
nondefective will be obtained. 


4. The desired probability Pr(Cancer | Positive) can be calculated as follows: 


Pr(Cancer) Pr(Positive | Cancer) 
Pr(Cancer) Pr(Positive | Cancer) + Pr(No Cancer) Pr(Positive | No Cancer) 
(0.00001) (0.95) 


gt nN Sms: 
(0.00001)(0.95) + (0.99999) (0.05) 


Commentary: It should be noted that even though this test provides a correct diagnosis 95 percent of 
the time, the probability that a person has this type of cancer, given that he has a positive reaction to 
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the test, is not 0.95. In fact, as this exercise shows, even though a person has a positive reaction, the 


probability that she has this type of cancer is still only 0.00019. In other words, the probability that the 
0.00019 ) 
= 19}, 


person has this type of cancer is 19 times larger than it was before he took this test cant 


but it is still very small because the disease is so rare in the population. 


5. The desired probability Pr(Lib.|NoVote) can be calculated as follows: 


Pr(Lib.) Pr(NoVote|Lib.) 
Pr(Cons.) Pr(NoVote|Cons.) + Pr(Lib.) Pr(NoVote|Lib.) + Pr(Ind.) Pr(NoVote|Ind.) 
(0.5) (0.18) 18 


(0.3)(0.35) + (0.5)(0.18) + (0.2)(0.50) 59° 


6. (a) Let A; denote the event that the machine is adjusted properly, let Ag denote the event that it 
is adjusted improperly, and let B be the event that four of the five inspected items are of high 
quality. Then 

Pr(A;) Pr(B | A) 
Pr(A)) Pr(B | A;) + Pr(Ag) Pr(B | Ag) 


(0.9) (;) (0.5)° a 
(0.9) (®)(0.5)> + (0.1)(®)(0.25)4(0.75) 97" 


Pr(A, | B) 


(b) The prior probabilities before this additional item is observed are the values found in part (a): 
Pr(A,) = 96/97 and Pr(A2) = 1/97. Let C denote the event that the additional item is of medium 
quality. Then 


96 1 
07.9 64 
Pr(Aj | C) = pf 4 a - T= 
97297 4 


7. (a) Let 2; denote the posterior probability that coin i was selected. The prior probability of each coin 
is 1/5. Therefore 
1 


SPi 
y= a) 


oa 
do BP 
j=l 


The five values are 7; = 0, 79 = 0.1, 73 = 0.2, 74 = 0.3, and m5 = 0.4. 


for i=1,...,5. 


(b) The probability of obtaining another head is equal to 
5 5 3 

Pr(Coin 7) Pr(Head | Coin 7) = Tip; = —. 

Y_Pr(Coin #)Pr(Head | Coin #) = So mips = 5 


(c) The posterior probability 7; of coin i would now be 


1 
g(t — Pi) 
a= fori =1,...,5 
i 
dx (1-5) 
j=l 
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Thus, 7, = 0.4, 72 = 0.3, 73 = 0.2,74 = 0.1, and a5 = 0. The probability of obtaining a head on 
5 


i 


the next toss is therefore MDi = 
i=1 


8. (a) If coin i is selected, the probability that the first head will be obtained on the fourth toss is 
(1 — p;)°p;. Therefore, the posterior probability that coin i was selected is 


=( — pi)? Pi 
T= for? =1,...,5. 
1 
25} 


The five values are 7; = 0, 72 = 0.5870, 73 = 0.3478, 74 = 0.0652, and 75 = 0. 


1—p;)°p 


(b) If coin 7 is used, the probability that exactly three additional tosses will be required to obtain 
another head is (1 — p;)*p;. Therefore, the desired probability is 


5 
>> m(1 — pi)?p; = 0.1291. 
i=1 


9. We shall continue to use the notation from the solution to Exercise 14 in Sec. 2.1. Let C be the 
event that exactly one out of seven observed parts is defective. We are asked to find Pr(B;|C) for 
j = 1,2,3. We need Pr(C|B;) for each 7. Let A; be the event that the ith part is defective. For all 
i, Pr(A;|Bi) = 0.02, Pr(A;|B2) = 0.1, and Pr(A;|B3) = 0.3. Since the seven parts are conditionally 
independent given each state of the machine, the probability of each possible sequence of seven parts 
with one defective is Pr(A;|B;)[1 — Pr(A;|B;)]®. There are seven distinct such sequences, so 


Pr(C|B,) = 7x 0.02 x 0.98° = 0.1240, 
Pr(C|Bz) = 7x 0.1 x 0.9% = 0.3720, 
Pr(C|B3) 7 x 0.3 x 0.7° = 0.2471. 


The expression in the denominator of Bayes’ theorem is 
Pr(C) = 0.8 x 0.1240 + 0.1 x 0.3720 + 0.1 x 0.2471 = 0.1611. 


Bayes’ theorem now says 


0.8 x 0.1240 
Pr(B,|C) = a = 0.6157, 
0.1 x 0.3720 
Pr(Bo|C) = — = 0.2309, 
0.1 x 0.2471 


10. Bayes’ theorem says that the posterior probability of each B; is Pr(B;|E) = Pr(B;) Pr(E|B;)/Pr(£). 
So Pr(B,|E£) < Pr(B;) if and only if Pr(E|B;) < Pr(£). Since Pr(£) = 3/4, we need to find those i for 
which Pr(E|B;) < 3/4. These are i = 5,6. 


11. This time, we want Pr(B,4|E°). We know that Pr(£°) = 1— Pr(£) = 1/4 and Pr(E£*|Ba) = 1 - 
Pr(£|B,) = 1/4. This means that E° and By are independent so that Pr(By4|E°) = Pr(Ba) = 1/4. 
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12. We are doing the same calculations as in Examples 2.3.7 and 2.3.8 except that we only have five patients 
and three successes. So, in particular 


5 


Pr(B;) & ([j — 1]/10)° (1 — [7 — 1]/10)? 


Pr(B; | A) = (8.2.5) 


11 


> Pr(B,) (3) ({i — 1)/10)8(a — fe - a)/10)? 
71 


In one case, Pr(B;) = 1/11 for all 7, and in the other case, the prior probabilities are given in the table 
in Example 2.3.8 of the text. The numbers that show up in both calculations are 


e—1y" f=1\? |. . f= 1)? i 
1 0 7 0.0346 
2 0.0008 8 0.0309 
3 0.0051 9 0.0205 
4 0.0132 10 0.0073 
5 0.0230 11 0 
6 0.0313 


We can use these with the two sets of prior probabilities to compute the posterior probabilities according 


to Eq. (S.2.5). 
i Example 2.3.7 Example 2.3.8 | 7 Example 2.3.7 Example 2.3.8 
1 0 0 7 0.2074 0.1641 
2 0.0049 0.0300 8 0.1852 0.0879 
3 0.0307 0.0972 9 0.1229 0.0389 
4 0.07939 0.1633 10 0.0437 0.0138 
5 0.1383 0.1969 11 0 0 
6 0.1875 0.2077 


These numbers are not nearly so close as those in the examples in the text because we do not have as 
much information in the small sample of five patients. 


13. (a) Let By, be the event that the coin is fair, and let By be the event that the coin has two heads. 
Let H; be the event that we obtain a head on the ith toss for 7 = 1,2,3,4. We shall apply Bayes’ 
theorem conditional on Hy Ho. 


Pr(By|Ay Hen Hs) 
Pr(B,|Ay ia Hy) Pr(H3|By NAN Hy) 
Pr(B,|Ay N Hy) Pr(H3| By NAN Hy) + Pr( Bo|Ay a 2) Pr(A3| Bo NAN Hy) 
(1/5) x (1/2) 4 
(1/5) x (1/2) + (4/5) x1 9 


(b) If the coin ever shows a tail, it can’t have two heads. Hence the posterior probability of B, becomes 
1 after we observe a tail. 


14. In Exercise 23 of Sec. 2.2, B is the event that the programming task was easy. In that exercise, we 
computed Pr(A|B) = 0.2215 and Pr(A|B°) = 0.02335. We are also told that Pr(B) = 0.4. Bayes’ 
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theorem tells us that 


Pr(B) Pr(A|B) 7 0.4 x 0.2215 
Pr(B) Pr(A|B) + Pr(B°) Pr(A|B°) 0.4 x 0.2215 + (1 — 0.4)0.02335 
0.8635. 


Pr(B|A) 


15. The law of total probability tells us how to compute Pr(£;). 


a 


11 
—1 

Pri) = > Pr(Bp 

: 10 

i=1 
Using the numbers in Example 2.3.8 for Pr(B;) we obtain 0.274. This is smaller than the value 0.5 
computed in Example 2.3.7 because the prior probabilities in Example 2.3.8 are much higher for the B; 
with low values of i, particularly i = 2,3,4, and they are much smaller for those B; with large values 
of 7. Since Pr(£,) is a weighted average of the values (i — 1)/10 with the weights being Pr(B;) for 
i = 1,...11, the more weight we give to small values of (i — 1)/10, the smaller the weighted average 
will be. 


16. (a) From the description of the problem Pr(D;|B) = 0.01 for all 7. If we can show that Pr(D;|B°) = 
0.01 for all 7, then Pr(D;) = 0.01 for all i. We will prove this by induction. We have assumed that 
Dy, is independent of B and hence it is independent of B°. This makes Pr(D,|B‘°) = 0.01. Now, 
assume that Pr(D;|B°) = 0.01 for all i < 7. Write 


Pr(D;41|B°) = Pr(Dj41|D; M B°) Pr(Dj|B°) + Pr(Dj41|D§ N B°) Pr(D§|B°). 


The induction hypothesis says that Pr(D;|B°) = 0.01 and Pr(D§|B°) = 0.99. In the problem 
description, we have Pr(Dj+1|Dj;0 B®) = 2/5 and Pr(Dj41|D§ B°) = 1/165. Plugging these into 
(16a) gives 


2 1 
Pr(D,.,|B°) = = x 0.01 + —— x 0.99 = 0.01. 
r(Dja4q|B*) 5 x + 165 x 


This completes the proof. 
(b) It is straightforward to compute 


Pr(£|B) = 0.99 x 0.99 x 0.01 x 0.01 x 0.99 x 0.99 = 0.00009606. 
By the conditional independence assumption stated in the problem description, we have 
Pr(£|B°) = Pr(D{|B°) Pr(D3|D{NB*) Pr(D3|D5NB°) Pr(D4|D3NB*) Pr(D§|DaNB°) Pr(Dg|DsNB*). 
The six factors on the right side of this equation are respectively 0.99, 164/165, 1/165, 2/5, 3/5, 


and 164/165. The product is 0.001423. It follows that 


Pr(£|B) Pr(B) 
Pr(£|B) Pr(B) + Pr(£|B°) Pr(B°) 
0.00009606 x (2/3) 


SS 
0.00009606 x (2/3) + 0.001423 x (1/3) 


Pr(B|E) = 
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2.4 The Gambler’s Ruin Problem 


Commentary 


This section is independent of the rest of the book. Instructors can discuss this section at any time that they 
find convenient or they can omit it entirely. 

If Sec. 3.10 on Markov chains has been discussed before this section is discussed, it is helpful to point out 
that the game considered here forms a Markov chain with stationary transition probabilities. The state of 
the chain at any time is the fortune of gambler A at that time. Therefore, the possible states of the chain are 
the k + 1 integers 0,1,...,k. If the chain is in state 7 (¢=1,...,4—1) at any time n, then at time n+ 1 it 
will move to state 7+ 1 with probability p and it will move to state 7—1 with probability 1—p. It is assumed 
that if the chain is either in the state 0 or the state k at any time, then it will remain in that same state at 
every future time. (These are absorbing states.) Therefore, the (k + 1) x (k + 1) transition matrix P of the 
chain is as follows: 


1 0 0 0 0 00 
l-p 0 p 0 0 00 

0 1l-p 0 p 0 00 

p-| 0 0 1-p 0 0 00 
0 0 0 0 l-p 0 p 

0 0 0 0 0 O01 


Solutions to Exercises 


1. Clearly a; in Eq. (2.4.9) is an increasing function of 7. Hence, if agg < 1/2, then a; < 1/2 for all i < 98. 
For i = 98, Eq. (2.4.9) yields almost exactly 4/9, which is less that 1/2. 


2. The probability of winning a fair game is just the ratio of the initial fortune to the total funds available. 
This ratio is the same in all three cases. 


3. If the initial fortune of gambler A is i dollars, then for conditions (a), (b), and (c), the initial fortune of 
gambler B is 1/2 dollars. Hence, k = 3i/2. If we let r = (1 — p)/p > 1, then it follows from Eq. (2.4.8) 
that the probability that A will win under conditions (a), (b), or (c) is 


ri—1 1 — (1/r;) 


3/2 pt/2 — (1 /ry) 
If i and j are positive integers with 7 < j, it now follows that 


1= (fry). 1-(/y) 1 O/r) 
ra (Vr) ~ P= (]ry) ~ FP (Ufri) 


Thus the larger the initial fortune of gambler A is, the smaller is his probability of winning. Therefore, 
he has the largest probability of winning under condition (a). 


4. If we consider this problem from the point of view of gambler B, then each play of the game is 
unfavorable to her. Hence, by a procedure similar to that described in the solution to Exercise 3, it 
follows that she has the smallest probability of winning when her initial fortune is largest. Therefore, 
gambler A has the largest probability of winning when her initial fortune is largest, which corresponds 
to condition (c). 
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5. In this exercise, p = 1/2 and k =i+2. Therefore a; = i/(i + 2). In order to make a; > 0.99, we must 
have 7 > 198. 


6. In this exercise p = 2/3, and k =i+2. Therefore, by Eq. (2.4.9), 
1 PD 
(5) =} 
_ 2 


It follows that a; > 0.99 if and only if 
2 as. 


Therefore, we must have i > 7. 


7. In this exercise p = 1/3 and k =i+2. Therefore, by Eq. (2.4.9) 


2-1 — 1-(1/2") 


4 = oa2 1 41/28)" 


= 


1 1 
But for every number x (0 < x < 1), we have m < i Hence, a; < 1/4 for every positive integer 7. 


cea £3 


8. This problem can be expressed as a gambler’s ruin problem. Suppose that the initial fortunes of both 
gambler A and gambler B are 3 dollars, that gambler A will win one dollar from gambler B whenever 
a head is obtained on the coin, and gambler B will win one dollar from gambler A whenever a tail is 
obtained on the coin. Then the condition that X, = Y, + 3 means that A has won all of B’s fortune, and 
the condition that Y, = X, +3 means that A is ruined. Therefore, if p = 1/2, the required probability 
is given by Eq. (2.4.6) with i = 3 and k = 6, and the answer is 1/2. If p £ 1/2, the required probability 
is given by Eq. (2.4.9) with 7 = 3 and k = 6. In either case, the answer can be expressed in the form 


1 
l-p\? 
(—*) +1 
Pp 

9. This problem can be expressed as a gambler’s ruin problem. We consider the initial fortune of gambler 
A to be five dollars and the initial fortune of gambler B to be ten dollars. Gambler A wins one dollar 
from gambler B each time that box B is selected, and gambler B wins one dollar from gambler A each 
time that box A is selected. Since i=5,k = 15, and p = 1/2, it follows from Eq. (2.4.6) that the 


probability that gambler A will win (and box B will become empty) is 1/3. Therefore, the probability 
that box A will become empty first is 2/3. 


2.5 Supplementary Exercises 


Solutions to Exercises 


1. Let Pr(D) = p> 0. Then 


Pr(A) = pPr(A|D)+(1—p)Pr(A| D*) 
> pPr(B| D)+(1—p)Pr(B| D*°) = Pr(B). 
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2. 


10. 


11. 


12 
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(a) Sample space: 
AT TH 
AAT TTH 


HHHAT TTTH 


Pr(ANB Pr(ANB 2 


) _ e) 
and Pr(B | A) = (1/5) (1/3) 3 


. Since Pr(A | B) = ~ Pr(B) ~ Pr(A) 
Hence Pr(AN B) = 1/12, and Pr(A® U B®) = 1 — Pr(AN B) = 11/12. 
. Pr(AU B° | B) = Pr(A | B) + Pr(B° | B) — Pr(An BS | B) = Pr(A) +0—-0 = Pr(A). 


i\ iy? fey" 
. The probability of obtaining the number 6 exactly three times in ten rolls is a = ("") (=) (>) F 


Hence, the probability of obtaining the number 6 on the first three rolls and on none of the subsequent 


, ian" . eee: 10 
rolls is b= 6 Ae Hence, the required probability is a 1/ 3 | 


04 
. Pr(AN B) = es ou, aoe 0.16. But also, by independence, 
Pr(D| ANB) 0.25 


Pr(ANM B) = Pr(A) Pr(B) = 4[Pr (A)]?. 
Hence, 4[Pr(A)]? = 0.16, so Pr(A) = 0.2. It now follows that 
Pr(A U B) = Pr(A) + Pr(B) — Pr(AN B) = (0.2) + 4(0.2) — (0.16) = 0.84. 


. The three events are always independent under the stated conditions. The proof is a straightforward 
generalization of the proof of Exercise 2 in Sec. 2.2. 


. No, since Pr(AN B) = 0 but Pr(A) Pr(B) > 0. This also follows from Theorem 2.2.3. 


. Let Pr(A) =p. Then Pr(AN B) = Pr(AN BNC) =0, Pr(ANC) = 4p”, Pr(BNC) = 8p?. Therefore, 
by Theorem 1.10.1, 5p = p+ 2p + 4p — [0 + 4p? + 8p?] + 0, and p= 1/6. 


Pr(Sum = 7) = 2 Pr{(1,6)] + 2 Pr[(2, 5)] + 2 Pr[(3, 4)] = 2(0.1)(0.1) + 2(0.1)(0.1) + 2(0.3)(0.3) = 0.22. 
1 — Pr(losing 50 times) = 1 — (2)". 


. The event will occur when (X1, X2, X3) has the following values: 
1) (6, 4, 1) (6,3, 1) (o, 4, 1) (5:3; 1) 
2) (6,4,2) (6,3,2) (5,4,2) (5,3, 2) 

,3) (64,3) (6,2,1) 6,43) (6,2,1) 
4) 

Each of these 20 points has probability 1/6°, so the answer is 20/216 = 5/54. 
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13. Let A, B, and C' stand for the events that each of the students is in class on a particular day. 


(a) We want Pr(AU BUC). We can use Theorem 1.10.1. Independence makes it easy to compute the 
probabilities of the various intersections. 


Pr(AU BUC) =0.34 0.5 + 0.8 — [0.3 x 0.54 0.3 x 0.8+0.5 x 0.8] + 0.3 x 0.5 x 0.8 = 0.93. 
(b) Once again, use independence to calculate probabilities of intersections. 
Pr(An BSN C*) + Pr(A°n BNC) + Pr(A°n BNC) 
= (0.3)(0.5)(0.2) + (0.7)(0.5)(0.2) + (0.7)(0.5)(0.8) = 0.38. 
14. Seven games will be required if and only if team A wins exactly three of the first six games. This 
probability is : pr(1 — p)°, following the model calculation in Example 2.2.5. 
3! 


2 
15. Pr(Each box contains one red ball) = a ah Pr(Each box contains one white ball). 


») 2 
So Pr(Each box contains both colors) = (=) 


16. Let A; be the event that box 7 has at least three balls. Then 


n° n° 


5 5 
5 (5)e-a" (Jno i 
PrA,) = S > Pr(Box i has exactly 7 balls) = ~+~—~——— + ~+—_———_- + ab Ps Say. 
j=3 


Since there are only five balls, it is impossible for two boxes to have at least three balls at the same 
time. Therefore, the events A; are disjoint, and the probability that at least one of the events A; occurs 
is np. Hence, the probability that no box contains more than two balls is 1 — np. 


17. Pr(U + V = J) is as follows, for 7 = 0,1,...,18: 


j Prob. J Prob. 
0 0.01 10 0.09 
1 0.02 11 0.08 
2 0.03 12 0.07 
3 0.04 13 0.06 
4 0.05 14 0.05 
5 0.06 15 0.04 
6 0.07 16 0.03 
7 0.08 17 0.02 
8 0.09 18 0.01 
9 0.10 
Thus 
18 

PrU+V=W+X) = SoPr(U+V =) Pr(W4+X =) 

j=0 


= (0.01)? + (0.02)? +---+ (0.01)? = 0.067. 
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18. Let A; denote the event that member i does not serve on any of the three committees (i = 1,...,8). 
Then 


19. 


20. 


as OG ses 


8 7 


Pr(A;n.A;) = () () ()_ (2-2) (=: =)(=-3) tee Ree 


yoo °" 


= cfori<j<k, 
Pr ApniApnAgnAg = 0, 4< 7 ek <4, 


8 7 6 


8 7 6 


5 5 
ee moe ace (2 4 *) ne 3 are y, 1) 


Hence, by Theorem 1.10.2, 


r-(G.a) <00-(So+(ese 


Therefore, the required probability is 1 — .7207 = .2793. 


Let E; be the event that A and B are both selected for committee i (i = 1,2,3) and let Pr(E;) = pj. 


Then 
() () () 
1 2 3 
x 0.1071, = = 0.2143 = ~ 0.3571. 


Since EF, 2, and E3 are independent, it follows from Theorem 1.10.1 that the required probability is 


Pr(f, UE2U £3) = pi +p2+p3 — pipe — pep3 — pip3 + pip2p3 
0.5490. 


2 


Let E denote the event that B wins. B will win if A misses on her first turn and B wins on her 
first turn, which has probability rae or if both players miss on their first turn and B then goes 


5 5 
on to subsequently win, which has probability (2) > Pr(E£). (See Exercise 17, Sec. 2.2.) Hence, 


1 
Pr(#) = (2) (=) + (2) (2) Pr(£), and Pr(F&) = =. This problem could also be solved by summing 


onto (3) (5) (8) (8) (8) 
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21. A will win if he wins on his first toss (probability 1/2) or if all three players miss on their first tosses 
(probability 1/8) and then A subsequently wins. Hence, 


1 1 
Pr(A wins) = 5 + 5 Pr(A wins), 


and Pr(A wins) = 4/7. 


Similarly, B will win if A misses on his first toss and B wins on his first toss, or if all three players miss 
on their first tosses and then B subsequently wins. Hence, 


1 1 
Pr(B wins) = i + 5 Pr(B wins), 


and Pr(B wins) = 2/7. Thus, Pr(C' wins) = 1 — 4/7 — 2/7 = 1/7. 


22. Let A; denote the outcome of the jth roll. Then 


Pr(x = xr) = Pr(Ag # Ay, Ag x Ag, eae Ag] # Ay—2, Ax = Ag=1). 
= Pr( Ag 4 Aj) Pr(A3 4 Ag | Ao # Aj) --+Pr(A, = A, 1 | Ax 1Z#Az 2, etc.). 


-@-O0-O"'@ 


x—2 factors 


23. Let A be the event that the person you meet is a statistician, and let B be the event that he is shy. 
Then 


7 (0.8)(0.1) _ 
Pr(Al B) = Gai + isos) 7 8! 
0.05) (0.2 
24. Pr(A| lemon) = @pEy Oa) +4 a as a + (0.1)(0.3) — 5: 
7 (0.9)(0.3) 7 a 
25. (a) Pr(Defective | Removed) = 0.903) +0.D07) > ti = (0659. 
(b) Pr(Defective | Not Removed) = OOS) Se 0.051. 


(0.1)(0.3) + (0.8)(0.7) 59 


26. Let X and Y denote the number of tosses required on the first experiment and second experiment, 
respectively. Then X =n if and only if the first n — 1 tosses of the first experiment are tails and the 
nth toss is a head, which has probability 1/2”. Furthermore, Y > n if and only if the first n tosses of 
the second experiment are all tails, which also has probability 1/2”. 

Hence 


PAY Sa |X =n) Pr xX =a) 


a 


Pr(Y>X) = 


1 1 cams | 
= S12 a 


n=1 


8 
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27. Let A denote the event that the family has at least one boy, and B the event that it has at least one 
girl. Then 


Ky 
& 
| 


1— (1/2)", 
Pr(ANB) = 1-—Pr(All girls) — Pr(All boys) = 1— (1/2)” — (1/2)”. 


Pr(ANB) _ 1-(1/2)"1? 


PMATE) = Spy T= 


28. (a) Let X denote the number of heads, Then 


_ _ Prix =n-1) 
Fae SBD 2) er Sad) 
(7 1)" _ n _ an 


(a) + (.2a) + (2)] or mo ntl met n+ 2 


(b) The required probability is the probability of obtaining exactly one head on the last two tosses, 
namely 1/2. 


29. (a) Let X denote the number of aces selected. 


Then 
(i)( : 
a/\13—4 
Pr(X =i) = A+++ —_, i= 0,1,2,3,4. 


Cs) 


1— Pr(X = 0) — Pr(X = 1) 


1— Pr(X = 0) 
~ 120.3038 — 0.4388 _ 5 gage 
1 — 0.3038 


(b) Let A denote the event that the ace of hearts and no other aces are obtained, and let H denote 
the event that the ace of hearts is obtained. 


Then 
48 
Pr(A = 0.1097, Pr(H : 0.25 
r( er a 5 r( )==5 = 0. 5 
13 
The required probability is 
Pr(H) — Pr(A 25 — 0.1 
r(H) 1(A) _ 0.25 —0 097 _ 95619. 


Pr(H) . 0.25 
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32. 


33. 


34. 


35. 
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The probability that a particular letter, say letter A, will be placed in the correct envelope is 1/n. 
The probability that none of the other n — 1 letters will then be placed in the correct envelope is 
dn—1 = 1—pn—1- Therefore, the probability that only letter A, and no other letter, will be placed in the 
correct envelope is gn—1/n. It follows that the probability that exactly one of the n letters will be placed 
in the correct envelope, without specifying which letter will be correctly placed is nqn_—1/n = dn—-1.- 


The probability that two specified letters will be placed in the correct envelopes is 1[n(n — 1)]. The 
probability that none of the other n — 2 letters will then be placed in the correct envelopes is gn—2. 
Therefore, the probability that only the two specified letters, and no other letters, will be placed 


1 
in the correct envelopes is nin ~ 1)" It follows that the probability that exactly two of the n 
n 


n—-1 
letters will be placed in the correct envelopes, without specifying which pair will be correctly placed, 
. [n 1 1 
is ——~n-2 = =In_2: 


The probability that exactly one student will be in class is 
Pr(A) Pr(B°) + Pr(A°) Pr(B) = (0.8)(0.4) + (0.2)(0.6) = 0.44. 
The probability that exactly one student will be in class and that student will be A is 


Pr(A) Pr(B") = 0.32. 


3288 
H th ired probability is — = —. 
ence, the required probability is 77 = 7; 
By Exercise 3 of Sec. 1.10, the probability that a family subscribes to exactly one of the three newspapers 
is 0.45. As can be seen from the solution to that exercise, the probability that a family subscribes only 
to newspaper A is 0.35. Hence, the required probability is 35/45 = 7/9. 


A more reasonable analysis by prisoner A might proceed as follows: The pair to be executed is equally 
likely to be (A, B), (A,C), or (B,C). If it is (A,B) or (A,C), the jailer will surely respond B or C, 
respectively. If it is (B,C), the jailer is equally likely to respond B or C.. Hence, if the jailer responds 
B, the conditional probability that the pair to be executed is (A, B) is 


1-Pr(A, B) 


Pr[(A, B) | response] = —§=£— >, 
1-Pr(A, B) +0- Pr(A,C) + 5 Pr(B,C) 


1 

i 
= 3 = 

1 

Q-.= 
Oot 


Thus, the probability that A will be executed is the same 2/3 as it was before he questioned the jailer. 
This answer will change if the probability that the jailer will respond B, given (B,C), is assumed to 
be some value other than 1/2. 


The second situation, with stakes of two dollars, is equivalent to the situation in which A and B have 
initial fortunes of 25 dollars and bet one dollar on each play. In the notation of Sec. 2.4, we have 7 = 50 
and k = 100 in the first situation and 7 = 25 and k = 50 in the second situation. Hence, if p = 1/2, 
it follows from Eq. (2.4.6) that gambler A has the same probability 1/2 of ruining gambler B in either 
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situation. If p 4 1/2, then it follows from Eq. (2.4.9) that the probabilities a, and a2 of winning in the 
two situations equal the values 


o = Lh pl/py=1 1 
‘(l= pl/p)0 1 (t= p/p) 41 
oy = Weal)? -1 _ 1 


(i—pl/p)-1 (l= pl/p) +1 


Hence, if p < 1/2, then ({1 — p|/p) > 1 and a2 > ay. If p> 1/2, then ({1 — p|/p) < 1 and a; > ag. 


(a) 


(b) 


— 
io) 
~ 


(e) 


(f) 


Since each candidate is equally likely to appear at each point in the sequence, the one who happens 
to be the best out of the first i has probability r/¢ of appearing in the first r interviews when 
t>T: 

Ifi <r, then A and B; are disjoint and Pr(ANM B;) = 0 because we cannot hire any of the first r 
candidates. So Pr(A|B;) = Pr(AN B;)/ Pr(B;) = 0. Next, let i > r and assume that B; occurs. 
Let C; denote the event that we keep interviewing until we see candidate i. If C; also occurs, then 
we shall rank candidate 7 higher than any of the ones previously seen and the algorithm tells us 
to stop and hire candidate 7. In this case A occurs. This means that B; MC; C A. However, if C; 
fails, then we shall hire someone before we get to interview candidate 7 and A will not occur. This 
means that Bs NCSN A =. Since B;N A = (B;NC;,N A)U(B;N CEN A), we have B|NA = BNC; 
and Pr(B;7 A) = Pr(B;NC;). So Pr(A|B;) = Pr(C;|B;). Conditional on B;, C; occurs if and only 
if the best of the first 7 — 1 candidates appears in the first r positions. The conditional probability 
of C; given B; is then r/(i— 1). 

If we use the value r > 0 to determine our algorithm, then we can compute 


1 
i-1 


. “Siriorg 
pra PH A) = DP BEA B= Do Sarg ee 
— ar | i=r+1 


For r = 0, if we take r/r = 1, then only the first term in the sum produces a nonzero result and 
po = 1/n. This is indeed the probability that the first candidate will be the best one seen so far 
when the first interview occurs. 


Using the formula for p, with r > 0, we have 


_ il ss il 1 
dr = Pr — Pr-1 = ; f—14 ) 


i=r+1 


which clearly decreases as r increases because the terms in the sum are the same for all r, but 
there are fewer terms when r is larger. Since all the terms are positive, gq, is strictly decreasing. 


Since p, = gy + pr_1 for r > 1, we have that p, = po + q1 +-::+4q,. If there exists r such that 
qr < 0, then q; < 0 for all 7 > r and p; < p,_y for all 7 > r. On the other hand, for each r such 
that q, > 0, p- > pr—1. Hence, we should choose r to be the last value such that gq, > 0. 


For n = 10, the first few values of q, are 


r 1 2 3 4 
dr | 0.1829 0.0829 0.0390 —0.0004 


So, we should use r = 3. We can then compute p3 = 0.3987. 
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Random Variables and Distributions 


3.1 Random Variables and Discrete Distributions 


Solutions to Exercises 


1; 


Ww) /i\” 
. For « = 0,1,...,10, the probability of obtaining exactly x heads is ( (5) : 
x 


Each of the 11 integers from 10 to 20 has the same probability of being the value of X. Six of the 11 
integers are even, so the probability that X is even is 6/11. 


5 


. The sum of the values of f(a) must be equal to 1. Since > f(x) = 15c, we must have c = 1/15. 


=] 


. By looking over the 36 possible outcomes enumerated in Example 1.6.5, we find that X = 0 for 6 


outcomes, X = 1 for 10 outcomes, X = 2 for 8 outcomes, X = 3 for 6 outcomes, X = 4 for 4 outcomes, 
and X = 5 for 2 outcomes. Hence, the p.f. f(x) is as follows: 


x 0 1 2 3 4 5 
f(z) [3/18 5/18 4/18 3/18 2/18 1/18 


2 


7 3 10 
. For « = 2,3,4,5, the probability of obtaining exactly x red balls is @ (, 7 .) i( 5 : 


. The desired probability is the sum of the entries for k = 0, 1, 2, 3, 4, and 5 in that part of the table of 


binomial probabilities given in the back of the book corresponding to n = 15 and p= 0.5. The sum is 
0.1509. 


. Suppose that a machine produces a defective item with probability 0.7 and produces a nondefective 


item with probability 0.3. If X denotes the number of defective items that are obtained when 8 items 
are inspected, then the random variable X will have the binomial distribution with parameters n = 8 
and p = 0.7. By the same reasoning, however, if Y denotes the number of nondefective items that are 
obtained, then Y will have the binomial distribution with parameters n = 8 and p = 0.3. Furthermore, 
Y = 8-—X. Therefore, X > 5 if and only if Y < 3 and it follows that Pr(X > 5) = Pr(Y < 3). 
Probabilities for the binomial distribution with n = 8 and p = 0.3 are given in the table in the back of 
the book. The value of Pr(Y < 3) will be the sum of the entries for k = 0,1, 2, and 3. 
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8. The number of red balls obtained will have the binomial distribution with parameters n = 20 and 
p =0.1. The required probability can be found from the table of binomial probabilities in the back of 
the book. Add up the numbers in the n = 20 and p = 0.1 section from k = 4 to k = 20. Or add up the 
numbers from k = 0 to k = 3 and subtract the sum from 1. The answer is 0.1330. 


9. We need 3792.9 f(x) = 1, which means that c = 1/5°92)2~*. The last sum is known from Calculus to 
equal 1/(1 — 1/2) = 2, soc=1/2. 


a 


10. (a) The p.f. of X is f(x) = c(a+1)(8—2) for z =0,...,7 where c is chosen so that ye fiz) = 1s 86, 
x=0 
7 
is one over S > (#+1)(8-2), which sum equals 120, soc = 1/120. That is f(a) = (~+1)(8—2)/120 
«=0 
for x =0,...,7. 


(b) Pr(X > 5) = [(6+ 1)(8 — 5) + (6+ 1)(8 — 6) + (7 + 1)(8 — 7)]/120 = 1/3. 
11. In order for the specified function to be a p.f., it must be the case that xs 6 Gt equivalently 
x 
z=1 


ell 1 =| 
~ —=-. But x, — = oo, so there cannot be such a constant c. 
g=1 a e v1 = 


3.2 Continuous Distributions 


Commentary 


This section ends with a brief discussion of probability distributions that are neither discrete nor continuous. 
Although such distributions have great theoretical interest and occasionally arise in practice, students can 
go a long way without actually concerning themselves about these distributions. 


Solutions to Exercises 
1. We compute Pr(X < 8/27) by integrating the p.d.f. from 0 to 8/27. 
8 er Dg 8/27 A 
ee) £=1/38 4, — 2/3 ey 
Pr (x < =) i 32 dx = x ‘ 9 


2. The p.d.f. has the appearance of Fig. 8.3.1. 


A 
4 
3 


0 1 x 


Figure $.3.1: Figure for Exercise 2 of Sec. 3.2. 


(a) Pr (x < 5) = [° A(1 — 2*)\dx/3 = 0.6458. 
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1 3 3/4 P 
(b) Pe(- <x <2)= | A(1 — 23)dx/3 = 0.5625. 
4 4 1/4 


(c) Pr (x > 5) = [a0 — 23)dx/3 = 0.5597. 


3. The p.d.f. has the appearance of Fig. 8.3.2. 


-3 0 3 x 


Figure $.3.2: Figure for Exercise 3 of Sec. 3.2. 


1 0 
(a) Pr(X <0) = x / (9 — «2)dr = 0.5. 
36 J_3 
1 1 
(b) Pr(-1< X <1)= ral (9 — 2*)dx = 0.4815. 
=i 
1 3 
(c) Pr(X > 2) = = | (9 — 2*)dx = 0.07407. 
36 Jo 
The answer in part (a) could also be obtained directly from the fact that the p.d.f. is symmetric about 


the point x = 0. Therefore, the probability to the left of z = 0 and the probability to the right of « = 0 
must each be equal to 1/2. 


4. (a) We must have 


lo) 2 7 
/ fla)dv = | cx*dz = =c = 1. 
—oo 1 3 
Therefore, c = 3/7. This p.d.f. has the appearance of Fig. $.3.3. 


I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
| 
2 


xy 


| 
1 
Figure $.3.3: Figure for Exercise 4a of Sec. 3.2. 
2 
(b) | f(x)dx = 37/56. 
3/2 
| 
5. (a) | gz de = 1/4, or 7/16 =1/4, Hence, t= 2. 
0 
4 
(b) / (a8) de 1/2, or 1-2 16 = 1/2. Hence, f= 8 
t 
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6. The value of X must be between 0 and 4. We will have 


0 if0< x < 1/2, 
1 afijoe xX < 3/2, 
Y=¢ 2 if3/2<X <5/2, 
3 if5/2< xX < 7/2, 
4 GPT 2 SX <a: 


We need not worry about how to define Y if X € {1/2,3/2,5/2,7/2}, because the probability that X 
will be equal to one of these four values is 0. It now follows that 


1/2 FD 
Pry =)) = i (u)de = oF 
3/2 1 
Prf¥ =1) = [ (de= a 
1/2 
5/2 1 
Pr(¥ =2) = [.. (x)dx = Z 
7/2 9 
Pry =3) = (x)dx = 5, 
5/2 8 
4 15 
PY =4) = [. f(w)de = =. 


7. Since the uniform distribution extends over an interval of length 10 units, the value of the p.d.f. must 
be 1/10 throughout the interval. Hence, 


s 7 
[ f(z) d= io’ 


8. (a) We must have 


[. ffjde= [> cexp(-22) dz = 5° =], 


Therefore, c= 2. This p.d.f. has the appearance of Fig. $.3.4. 


x< 


Figure $.3.4: Figure for Exercise 8a of Sec. 3.2. 


2 
(b) f f(e) de = exp(-2) - exp(—4). 
9. Since i 1/(1 + x) dx = on, there is no constant c such that i i irjor =. 
0 0 
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10. (a) We must have 


lo) 1 Cc 
Therefore c = 1/2. This p.d.f. has the appearance of Fig. S.3.5. 


0 1 
Figure $.3.5: Figure for Exercise 10a of Sec. 3.2. 


It should be noted that although the values of f(a) become arbitrarily large in the neighborhood 
of x = 1, the total area under the curve is equal to 1. It is seen, therefore, that the values of a 
p.d.f. can be greater than 1 and, in fact, can be arbitrarily large. 


(b) 7 (a) dx = 1 -(1/2)¥?. 


1 1 
11. Since i (1/x)dx = oo, there is no constant c such that | jigae=1, 
0 0 


12. We shall find the c.d.f. of Y and evaluate it at 50. The c.d.f. of arandom variable Y is F'(y) = Pr(Y < y). 
In Fig. 3.1, on page 94 of the text, the event {Y < y} has area (y — 1) x (200 — 4) = 196(y — 1) if 
1<y< 150. We need to divide this by the area of the entire rectangle, 29,204. The c.d.f. of Y is then 


0 for y <1, 
196(y — 1) for 1 < y < 150, 


FOS 
) 29204 
1 for y > 150. 


So, in particular, Pr(Y < 50) = 0.3289. 


13. We find Pr(X < 20) = i cadx = 200c. Setting this equal to 0.9 yields c = 0.0045. 


3.3. The Cumulative Distribution Function 


Commentary 


This section includes a discussion of quantile functions. These arise repeatedly in the construction of hy- 
pothesis tests and confidence intervals later in the book. 
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Solutions to Exercises 


1. The c.d.f. F(x) of X is 0 for x < 0. It jumps to 0.3 = Pr(X = 0) at x =0, and it jumps to 1 and stays 
there at « = 1. The c.d-f. is sketched in Fig. $.3.6. 


Figure $.3.6: C.d.f. of X in Exercise 1 of Sec. 3.3. 


2. The c.d.f. must have the appearance of Fig. 8.3.7. 


Figure $.3.7: C.d.f. for Exercise 2 of Sec. 3.3. 


3. Here Pr(X =n) = 1/2” forn =1,2,.... Therefore, the c.d.f. must have the appearance of Fig. 8.3.8. 


0.75 —— 


fo) 
= 
Po 
wo 
RE 
oa 
o 
x< 


Figure $.3.8: C.d.f. for Exercise 3 of Sec. 3.3. 


4. The numbers can be read off of the figure or found by subtracting two numbers off of the figure. 


(a) The jump at « = —1 is F(-1) — F( 
(b) The c.d.f. to the left of z = 0 is F(07 
(c) The c.d.f. at ¢=0 is F(0) = 0.2. 


= 0.1. 
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(d) There is no jump at x = 1, so Pr(X = 1) = 0. 
(e) F(3) — F(0) = 0.6. 
(f) F(3~) — F(0) =0.4. 
(g) F(3) — F(0") =0.7 
(h) F(2)— F(1) =0 
(i) F2)- FQ) =0 
G) 1-26) =0 
(k) 1— F(5-) =0 
(l) F(4) — F(37) = 0.2 
: for x < 0, 
5. f(e) = FO - 5° for 0 <2 < 3, 


0 for x > 3. 


The value of f(a) at « = 0 and x = 3 is irrelevant. This p.d.f. has the appearance of Fig. S.3.9. 


(oe 
< 


0 


Figure $.3.9: Figure for Exercise 5 of Sec. 3.3. 


dF(x) |} exp(~—3) fora <3, 
dx | 0 fore > 3: 


The value of f(x) at x = 3 is irrelevant. This p.d.f. has the appearance of Fig. S.3.10. 


Figure $.3.10: Figure for Exercise 6 of Sec. 3.3. 


bd 


It should be noted that although this p.d.f. is positive over the unbounded interval where x < 3, the 


total area under the curve is finite and is equal to 1. 


7. The c.d.f. equals 0 for « < —2 and it equals 1 for x > 8. For —2 < x < 8, the c.d.f. equals 


“dy «+2 
F = — = : 
(x) a 10 10 


The c.d.f. has the appearance of Fig. $.3.11. 
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F(x) 
1 Seis sera sw Sse Se Sess as ou ei 
a 
ra 1 
So 
a | 
4 
Lo \ 
ra | 
J | 
ee | 
= 
-2 (@) 8 4 


Figure $.3.11: Figure for Exercise 7 of Sec. 3.3. 


8. Pr(Z < z) is the probability that Z lies within a circle of radius z centered at the origin. This probability 
is 


Area of circle of radius z 5 ee 
Area of circle of radius 1 a SOP SS eS a8 


The c.d.f. is plotted in Fig. $.3.12. 


| 
| 
I 
| 
| 
I 
! 
0 1 


Figure $.3.12: C.d.f. for Exercise 8 of Sec. 3.3. 


9. Pr(Y = 0) = Pr(X < 1) = 1/5 and Pr(Y = 5) = Pr(X > 3) = 2/5. Also, Y is distributed uniformly 
between Y = 1 and Y = 3, with a total probability of 2/5. Therefore, over this interval F(y) will be 
linear with a total increase of 2/5. The c.d.f. is plotted in Fig. $.3.13. 


10. To find the quantile function F~!(p) when we know the c.d.f., we can set F(a) = p and solve for x. 


x Pp 
=p; x£=pt+pxr; xl-p)=p; x= —. 
ioe p+p (l—p) =p 


The quantile function is F~!(p) = p/(1 — p). 
11. As in Exercise 10, we set F(x) = p and solve for x. 


1 » 


9” =p; x7 =9p; x = 3p'/?, 


The quantile function of X is F~!(p) = 3p'/?. 
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F(y) 

1 i} 

0.6. -—< 

J | 

-~ | 

a 

0.2;—/" | | 
012 3 4 5 y 


Figure $.3.13: C.d.f. for Exercise 9 of Sec. 3.3. 


Once again, we set F(a) = p and solve for x. 
exp(t — 3) =p; x—3=log(p); x =3-+ log(p). 
The quantile function of X is F~!(p) = 3 + log(p). 


VaR. at probability level 0.95 is the negative of the 0.05 quantile. Using the result from Example 3.3.8, 
the 0.05 quantile of the uniform distribution on the interval [—12, 24] is 0.05 x 24 — 0.95 x 12 = —10.2. 
So, VaR at probability level 0.95 is 10.2 


Using the table of binomial probabilities in the back of the book, we can compute the c.d.f. F' of 
the binomial distribution with parameters 10 and 0.2. We then find the first values of x such that 
F(a) > 0.25, F(x) > 0.5, and F(x) > 0.75. The first few distinct values of the c.d-f. are 


c 0) 1 2 3 
F(x) | 0.0174 0.3758 0.6778 0.8791 


So, the quartiles are 1 and 3, while the median is 2. 


Since f(x) = 0 for x < 0 and for x > 1, the c.d.-f. F(x) will be flat (0) for x < 0 and flat (1) for x > 1. 
Between 0 and 1, we compute F(x) by integrating the p.d.f. For 0 < «<1, 


xv 
F(a) =H Qydy = x”. 
0 
The requested plot is identical to Fig. $.3.12 for Exercise 8 in this section. 


For each 0 < p < 1, we solve for x in the equation F(x) = p, with F specified in (3.3.2): 


_ 1 
a 14+2 
= 1+ 
= x 
1-—p ? 
1 
—-l = @&. 
b= 


The quantile function is F~!(p) = 1/(1 —p) —1 for 0 < p< 1. 
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17. 


18. 


19. 


20. 
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(a) Let 0 < py < po < 1. Define A; = {x : F(x) > pj} fori = 1,2. Since py < po and F is 
nondecreasing, it follows that 42 C A,. Hence, the smallest number in A, (which equals F~!(p;) 
by definition) is no greater than the smallest number in Ag (which equals F~!(p2) by definition. 
That is, F~!(p,) < F~'(p2), and the quantile function is nondecreasing. 

(b) Let x = lim F’ ~1(p). We are asked to prove that xo is the greatest lower bound of the set 

p> 


p>o 


C = {c: F(c) > 0}. First, we show that no x > 2 is a lower bound on C. Let x > xp and 
a, = (x+209)/2. Then x < 2 < x. Because F~!(p) is nondecreasing, it follows that there 
exists p > 0 such that F~+(p) < x1, which in turn implies that p < F(a1), and F(a1) > 0. 
Hence x; € C, and z is not a lower bound on C. Next, we prove that xo is a lower bound on C. 
Let x € C. We need only prove that ro < 2. Because F~'(p) is nondecreasing, we must have 
lim F-1(p) < F~1+(q) for all q > 0. Hence, xo < F~!(p) for all g > 0. Because x € C, we have 
p>o0 

F(x) > 0. Let ¢ = F(x) so that q > 0. Then x < F~'(q) < x. The proof that 2 is the least 
upper bound on the set of all d such that F'(d) < 1 is very similar. 


(c) Let 0 < p < 1. Because F~! is nondecreasing, F~!(p~) is the least upper bound on the set 
C = {F-'(q):q < p}. We need to show that F~!(p) is also that least upper bound. Clearly, 
F-1(p) is an upper bound, because F~! is nondecreasing and p > q for all q < p. To see that 
F~'(p) is the least upper bound, let y be an upper bound. We need to show F~1'(p) < y. By 
definition, F~'(p) is the greatest lower bound on the set D = {x : F(x) > p}. Because y is an 
upper bound on C, it follows that F~'(q) < y for all q < p. Hence, F(y) > q for all q < p. Because 
F is nondecreasing, we have F(y) > p, hence y € D, and F~1'(p) < y. 


We know that Pr(X =c) = F(c) — F(c_). We will prove that p; = F' * (c) and pp = Fic”). For each 
€ (0,1) define 
Cy = {a : F(a) > ph. 

Condition (i) says that, for every p € (po,p1), c is the greatest lower bound on the set Cp. Hence 
F(c) > p for all p < p, and F(c) > pi. If F(c) > pi, then for p = (p; + F(c))/2, F~'(p) < c, and 
condition (iii) rules this out. So F(c) = p;. The rest of the proof is broken into two cases. First, if 
po = 0, then for every € > 0, c is the greatest lower bound on the set C,. This means that F(x) < € for 
all x < c. Since this is true for all e > 0, F(x) = 0 for all x < c, and F(c_) =0 = po. For the second 
case, assume po > 0. Condition (ii) says F~'(po) < c. Since F~'(po) is the greatest lower bound on 
the set C,,, we have F(x) < po for all « < c. Hence, pp > F(c~). Also, for all p < po, p< F(c”), hence 
po < F(c_). Together, the last two inequalities imply po = F'(c”). 


First, we show that F~'(F(x)) < x. By definition F~!(F(z)) is the smallest y such that F(y) > F(2). 
Clearly F(x) > F(x), hence F~1(F(x)) < x. Next, we show that, if p > F(x), then F~!(p) > a. Let 
p > F(a). By Exercise 17, we know that F~!(p) > 2. By definition, F~'(p) is the greatest lower bound 
on the set Cp, = {y: F(y) = p}. Ally € CG, satisfy F(y) > (p+ F(ax))/2. Since F is continuous from 
the right, F(F-1(p)) > (p+ F(a))/2. But F(x) < (p+ F(x))/2, so x 4 F-+(p), hence F~1(p) > z. 


Figure 8.3.14 has the plotted c.d.f., which equals 0.004527/2 for 0 < 2 < 20. On the plot, we see that 
F(10) = 0.225. 


3.4 Bivariate Distributions 


Commentary 


The bivariate distribution function is mentioned at the end of this section. The only part of this discussion 
that is used later in the text is the fact that the joint p.d.f. is the second mixed partial derivative of the 
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Figure $.3.14: C.d.f. for Exercise 20 of Sec. 3.3. 


bivariate p.d.f. (in the discussion of functions of two or more random variables is Sec. 3.9.) If an instructor 
prefers not to discuss how to calculate probabilities of rectangles and is not going to cover functions of two 
or more random variables, there will be no loss of continuity. 


Solutions to Exercises 


1. (a) Let the constant value of the p.d.f. on the rectangle be c. The area of the rectangle is 2. So, the 
integral of the p.d.f. is 2c = 1, hence c = 1/2. 


(b) Pr(X > Y) is the integral of the p.d.f. over that part of the rectangle where x > y. This region is 
shaded in Fig. $.3.15. The region is a trapezoid with area 1 x (1+ 2)/2 = 1.5. The integral of the 


_ 


0. 


T 
0 


x 


Figure $.3.15: Region where x > y in Exercise 1b of Sec. 3.4. 


constant 1/2 over this region is then 0.75 = Pr(X > Y). 


2. The answers are found by summing the following entries in the table: 


(a) The entries in the third row of the table: 0.27. 
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(b) The last three columns of the table: 0.53. 

(c) The nine entries in the upper left corner of the table: 0.69 
(d) (0, 0), G1, 1), @, 2), and (3, 3): 0.2. 

(e) (1,0), (2,0), (3, ©, 2, Des, 1), 2) 0.25. 


3. (a) If we sum f(x,y) over the 25 possible pairs of values (x,y), we obtain 40c. Since this sum must 
be equal to 1, it follows that c = 1/40. 


(b) f(0,-2) = cccaaaa 1/20. 
() P(X =1)= So f(y) = 7/40. 


y=—2 


(d) The answer is found by summing f(x,y) over the following pairs: (—2,—2), (—2,—1), (—1,—2), 
(—1,-1), (-1,0), (0,—1), (0, 0), (0, 1), (1, 0), (1, 1), (1, 2), (2, 1), and (2, 2). The sum is 0.7. 


oo oo 1. 72 
4. (a) / / siz, odedy = | i cy’ dx dy = 2c/3. Since the value of this integral must be 1, it 
—oo J—00 0 JO 
follows that c = 3/2. 
(b) The region over which to integrate is shaded in Fig. 8.3.16. 


AY 


D 
I 
I 
I 
| 
! > 
0 1 2 x 


Figure $.3.16: Region of integration for Exercise 4b of Sec. 3.4. 


J [ tenacay 


shaded 


region 


Pr(X + Y > 2) 


(c) The region over which to integrate is shaded in Fig. S.3.17. 


Ay 


P| 


Figure $.3.17: Region of integration for Exercise 4c of Sec. 3.4. 


ij2 
pr(¥ < 5) = [ | “/ dy dx = 5. 
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Figure $.3.18: Region of integration for Exercise 4d of Sec. 3.4. 


(d) The region over which to integrate is shaded in Fig. 8.3.18. 


1 713 A 
Pr(X < 1) -| | ~y* dydx = =. 
0 Jo 2 2 
(e) The probability that (X,Y) will lie on the line x = 3y is 0 for every continuous joint distribution. 


(a) By sketching the curve y = 1 — x, we find that y < 1 — x? for all points on or below this curve. 
Also, y > 0 for all points on or above the x-axis. Therefore, 0 < y < 1 — x? only for points in the 
shaded region in Fig. $.3.19. 

14y 


y =1-¥? 


-1 | 0 1 x 
Figure $.3.19: Figure for Exercise 5a of Sec. 3.4. 


Hence, 


[oe] [o-e) 1 1—2?2 4 
/ / f(x,y) dedy = | | e(a* +y) dyda = ec: 
00 ¥—00 -1/0 


Therefore, c = 5/4. 
(b) Integration is done over the shaded region in Fig. $.3.20. 


1 z fi?’ 5 79 
oxo _ |= = . 2 ote 
Pe (0 ax 5) / | dawee [ | ri + y) dy dx aEG 


shade 
region 
hy 
A - 
-1 0 1/2 1 x 


Figure $.3.20: Region of integration for Exercise 5b of Sec. 3.4. 
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Figure $.3.21: Region of integration for Exercise 5c of Sec. 3.4. 


(c) The region over which to integrate is shaded in Fig. S.3.21. 


p(y<x+41) = ff feydedy=1-f | fle,ydedy 
shaded unshaded 


region region 


0 pl—e* 5 13 
= i-f f —(x7 + y) dydx = —. 
a qe + y) dy ic 


(d) The probability that (X,Y) will lie on the curve y = x? is 0 for every continuous joint distribution. 


6. (a) The region S is the shaded region in Fig. $.3.22. Since the area of $ is 2, and the joint p.d.f. is to 


> 
xX 


Figure $.3.22: Figure for Exercise 6a of Sec. 3.4. 


be constant over S, then the value of the constant must be 1/2. 


(b) The probability that (X,Y) will belong to any subset So is proportional to the area of that subset. 
Therefore, 
_ a 


Pr [(X,Y) € So] 7) [au ay = 5 (area of So) 
So 2 


7. (a) Pr(X < 1/4) will be equal to the sum of the probabilities of the corners (0, 0) and (0, 1) and 
the probability that the point is an interior point of the square and lies in the shaded region in 
Fig. $.3.23. The probability that the point will be an interior point of the square rather than one 


y 
(0,1) (1,1) 


> 
me 1/4 (1,0) x 


Figure $.3.23: Figure for Exercise 7a of Sec. 3.4. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 3.4. Bivariate Distributions 63 


of the four corners is 1 — (0.1+0.2+0.4+0.1) =0.2. The probability that it will lie in the shaded 
region, given that it is an interior point is 1/4. Therefore, 


1 1 
Pr (x < :) = 0.1+0.4 + (0.2) (5) = 0.55. 


(b) The region over which to integrate is shaded in Fig. 8.3.24. 


¥ 
(0,1) 1,1) 


(0,0) 
Figure $.3.24: Figure for Exercise 7b of Sec. 3.4. 


i 
Pr(X +Y <1) =0.1 40.2 +0.4 + (0.2) (5) ~0.8. 


8. (a) Since the joint c.d.f. is continuous and is twice-differentiable in the given rectangle, the joint 
distribution of X and Y is continuous. Therefore, 


Prl< X<2and 1<Y¥ <2) = Prl<xX<2and l< Ya 2] 


24 6 10 2 5 
F002) =F, = FOF), = — 4 ee 2 


Pr2<X% <4and 2<Y <4) = PrQ@<X<3eand 2<Y< 4) 
F(3,4) — F(2,4) — F(3,2) + F(2,2) 
64 66 24 25 


156 156 156 78° 
(c) Since y must lie in the interval 0 < y < 4, Fo(y) = 0 for y < 0 and Fo(y) = 1 for y > 4. For 


0s vee, 
Fi(y) = lim F(z, y) = lim ee +y)= ay (9+). 
Z—¥00 x3 156 52 
(d) We have f(x,y) = 0 unless 0 << x <3 and0<y< 4. In this rectangle we have 


O° F(z, y) il 
(e) The region over which to integrate is shaded in Fig. S.3.25. 


(3x? + 2y). 


3 fry 93 

Pr(Y < X) J / f(a, y) dx dy i) [ ing 8" + 2y) dy dx 508 
shaded 
region 


9. The joint p.d.f. of water demand X and electricy demand Y is in (3.4.2), and is repeated here: 


F* Ve 1/29204 if 4 <a < 200 and 1 < y < 150, 
mY) =) g otherwise. 
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AY 
4 


> 
0 3 x 


Figure $.3.25: Figure for Exercise 8e of Sec. 3.4. 


We need to integrate this function over the set where x > y. That region can be written as {(z,y) : 
4 <a < 200,1 < y < min{z,150}}. The reason for the complicated upper limit on y is that we require 
both y < x and y < 150. 


200 pmin{x,150} 1 200 yy; — 1.14 
| / ——_dydz = 7 min{x — 1, 149} he 
4 1 29204 4 29204 


150 rg—1 200 149 
= tat | alae 
| 59204°" * Jisy 29204°" 
(x —1)2 | 50x 149 
2x 29204), ' 29204 


1492 — 32-7450 
= 4 ___ = (2.63505. 
58408 7 29204 


10. (a) The sum of f(x,y) over all x for each fixed y is 


exp(—3y) s Coy" = exp(—3y) exp(2y) = exp(—y), 


x=0 


where the first equality follows from the power series expansion of exp(2y). The integral of the 
resulting sum is easily calculated to be 1. 


(b) We can compute Pr(X = 0) by integrating f(0, y) over all y: 
© (2y)? 1 
ee) [ 0! 3 


11. Let f(x,y) stand for the joint p.f. in Table 3.3 in the text for z = 0,1 and y = 1, 2,3, 4. 


exp(—3y)dy = 


(a) We are asked for the probability for the set {Y € {2,3}}N{X = 1}, which is f(1,2) + f(1,3) = 
0.166 + 0.107 = 0.273. 


(b) This time, we want Pr(X = 0) = f(0,1) + f(0,2) + f(0,3) + f(0,4) = 0.513. 


3.5 Marginal Distributions 


Commentary 


Students can get confused when solving problems like Exercises 7 and 8 in this section. They notice that the 
functional form of f(x,y) factors into g1(x)g2(y) for those (x,y) pairs such that f(x,y) > 0, but they don’t 
understand that the factorization needs to hold even for those (x,y) pairs such that f(x,y) = 0. When the 
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two marginal p.d.f.’s are both strictly positive on intervals, then the set of (x,y) pairs where f(x) fo(y) > 0 
must be a rectangle (with sides parallel to the coordinate axes), even if the rectangle is infinite in one or 
more directions. Hence, it is a necessary condition for independence that the set of (x,y) pairs such that 
f(x,y) > 0 be a rectangle with sides parallel to the coordinate axes. Of course, it is also necessary that 
f(x,y) = gi(x)g2(y) for those (x,y) such that f(x,y) > 0. The two necessary conditions together are 
sufficient to insure independence, but neither is sufficient alone. See the solution to Exercise 8 below for an 
illustration of how to illustrate that point. 


Solutions to Exercises 


1. The joint p.d.f. is constant over a rectangle with sides parallel to the coordinate axes. So, for each x, 
the integral over y will equal the constant times the length of the interval of y values, namely d— c. 
Similarly, for each y, the integral over x will equal the constant times the length of the interval of 
x values, namely b— a. Of course the constant & must equal one over the area of the rectangle. So 
k = 1/[(b—a)(d—c)]. So the marginal p.d.f.’s of X and Y are 


1 


fora<2z<6, 
fil) — b = 
0 otherwise, 
ik 
force <y<d, 
foly) = d—c 
0 otherwise. 


2. (a) For s =0,1,2, we have 
3 


file) =o f(a,y) = 


y=0 


1 


1 
4 = —(2 3). 
39 (42 + 8) 15 (22 + ) 


Similarly, for y = 0,1, 2,3, we have 
2 


foly) = So f(@y) = 


x=0 


(b) X and Y are not independent because it is not true that f(x,y) = fi(x)fo(y) for all possible 
values of x and y. 


1 


35 (3 + 3y) = (1 +4). 


~ 10 


3. (a) For 0 < x < 2, we have 
: 1 
file) =f flew ay = 5. 
0 
Also, fi(x) = 0 for x outside the interval 0 < x < 2. Similarly, for 0 < y < 1, 
2 
faly) = f° Fle.y) de = 3y?. 


Also, f2(y) = 0 for y outside the interval 0 < y < 1. 
(b) X and Y are independent because f(z, y) = fi(x) fo(y) for -co < x < oo and —co < y< ow. 
(c) We have 


1 1 il 
pr(x <1andY > 5) = | f(x,y) dx dy 
2 0 J1/2 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


66 Chapter 3. Random Variables and Distributions 


1 1 
[ file) foly) dev dy 


fi(x) dx hin fo(y) dy =Pr(X <1)Pr (v ss 5): 


Therefore, by the definition of the independence of two events (Definition 2.2.1), the two given 
events are independent. 

We can also reach this answer, without carrying out the above calculation, by reasoning as follows: 
Since the random variables X and Y are independent, and since the occurrence or nonoccurence 
of the event {X < 1} depends on the value of X only while the occurrence or nonoccurence of 
the event {Y > 1/2} depends on the value of Y only, it follows that these two events must be 
independent. 


4. (a) The region where f(x,y) is non-zero is the shaded region in Fig. $.3.26. It can be seen that the 


Ay 
1 


Figure $.3.26: Figure for Exercise 4a of Sec. 3.5. 


possible values of X are confined to the interval —-1 < X <1. Hence, f(x) = 0 for values of x 
outside this interval. For —1 < xa < 1, we have 


Similarly, it can be seen from the sketch that the possible values of Y are confined to the interval 
0<Y <1. Hence, fo(y) = 0 for values of y outside this interval. For 0 < y < 1, we have 


G-y? d 5 1 3/2 
Fal) = foam feu) de = 5 — Wy 
(b) X and Y are not independent because f(x,y) 4 fi(x) fo(y). 


5. (a) Since X and Y are independent, 
(eae PrCj=e and Yay) = Prix = 2) Pry 9) = apy. 
3 


(b) Pr(X =Y) = dS = Sow=oa 


(c) Pr(X > Y) = See ee ee On ee Ree Ce 


6. (a) Since X and Y are independent 


9 29 
f(y) = fle) fly) =o()oly) = 4 BAY OS *®SAOSUS? 


0 otherwise. 


(b) Since X and Y have a continuous joint distribution, Pr(X = Y) = 0. 
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(c) Since X and Y are independent random variables with the same probability distribution, it must 
be true that Pr(X > Y) = Pr(Y > X). Since Pr(X = Y) = 0, it therefore follows that Pr(X > 
¥) = 1/2. 

(d) Pr(X + Y < 1) = Pr(shaded region in sketch) 


1 pl-y il! 
={ f f(x,y) da dy = 755: 


Figure $.3.27: Figure for Exercise 6d of Sec. 3.5. 


7. Since f(x,y) = 0 outside a rectangle and f(x,y) can be factored as in Eq. (3.5.7) inside the rectangle 
(use hy(a) = 2x and ho(y) = exp(—y)), it follows that X and Y are independent. 


8. Although f(x,y) can be factored as in Eq. (3.5.7) inside the triangle where f(x,y) > 0, the fact that 
f(x,y) > 0 inside a triangle, rather than a rectangle, implies that X and Y cannot be independent. 
(Note that y > 0 should have appeared as part of the condition for f(x,y) > 0 in the statement of 
the exercise.) For example, to factor f(x,y) as in Eq. (3.5.7) we write f(x,y) = gi(x)ga(y). Since 
f(1/3,1/4) = 2 and f(1/6,3/4) = 3, it must be that g)(1/3) > 0 and go(3/4) > 0. However, since 
f (1/3, 3/4) = 0, it must be that either g,(1/3) = 0 or g2(3/4) = 0. These facts contradict each other, 
hence f cannot have a factorization as in (3.5.7). 


9. (a) Since f(x,y) is constant over the rectangle S and the area of S is 6 units, it follows that f(x,y) = 
1/6 inside S and f(x,y) = 0 outside S. Next, for0< a2 < 2, 

oo 4] il 

fila) -| f(x,y) dy -| a= =: 

—oo 1 6 2 


Also, fi (a) = 0 otherwise. Similarly, for 1 < y < 4, 


ca | 1 

— — d — 

fo(y) ge 3 
Also, fo(y) = 0 otherwise. Thus, the marginal distribution of both X and Y are uniform distri- 


butions. 
(b) Since f(x,y) = fi(x) fo(y) for all values of x and y, it follows that X and Y are independent. 
10. (a) f(x,y) is constant over the circle S in Fig. $.3.28. The area of S is 7 units, and it follows that 
f(x,y) =1/a inside S and f(x,y) = 0 outside S. Next, the possible values of x range from —1 to 


1. For any value of x in this interval, f(x,y) > 0 only for values of y between —(1 — x?)!/? and 
(1—a?)/?. Hence, for -1 <2 <1, 


(1—2?)1/2 1 2 
_ Sfp fia ee. 
fle) = faye ga gh) 


Also, fi(z) = 0 otherwise. By symmetry, the random variable Y will have the same marginal 
p.d.f. as X. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


68 


1; 


12. 
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x2 + y2=1 


-1 
Figure $.3.28: Figure for Exercise 10 of Sec. 3.5. 


(b) Since f(x,y) 4 fi(x) fo(y), X and Y are not independent. 
The conclusions found in this exercise in which X and Y have a uniform distribution over a circle 
should be contrasted with the conclusions found in Exercise 9, in which X and Y had a uniform 
distribution over a rectangle with sides parallel to the axes. 


Let X and Y denote the arrival times of the two persons, measured in terms of the number of minutes 
after 5 P.M. Then X and Y each have the uniform distribution on the interval (0, 60) and they are 
independent. Therefore, the joint p.d.f. of X and Y is 


1 
—_ for0<2<60,0<y<60, 
f(x,y) = 4 3600 4 


0 otherwise. 
We must calculate Pr(|X — Y| < 10), which is equal to the probability that the point (X,Y) lies in the 
shaded region in Fig. $.3.29. Since the joint p.d.f. of X and Y is constant over the entire square, this 


ry 
y 


60 


10 


i 


0 10 60 x 


Figure $.3.29: Figure for Exercise 11 of Sec. 3.5. 


probability is equal to (area of shaded region) /3600. The area of the shaded region is 1100. Therefore, 
the required probability is 1100/3600 = 11/36. 


Let the rectangular region be R = {(x,y) : %0 <a < 21, yo < y < y1} with xp and/or yo possibly —co 
and x; and/or y; possibly oo. For the “if” direction, assume that f(x,y) = hi(x)he(y) for all (x, y) 
that satisfy f(2,y) > 0. Then define 


4 hate) ag <2 < x, 
a a) = 0 otherwise. 
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: _ J holy) ifyo<y<m, 
ho(y) 7 0 otherwise. 


Then hi(z)h3(y) = hi(x)he(y) = f(z,y) for all (z,y) € R and Aj(x)h3(y) = 0 = f(az,y) for all 
(x,y) € R. Hence f(x,y) = hi (x)h3(y) for all (x,y), and X and Y are independent. 


For the “only if” direction, assume that X and Y are independent. According to Theorem 3.5.5, 
f(x,y) = hi(x)ha(y) for all (x,y). Then f(x,y) = hi(x)ho(y) for all (x,y) € R. 


Since f(x,y) = f(y,2) for all (x, y), it follows that the marginal p.d.f.’s will be the same. Each of those 
marginals will equal the integral of f(x,y) over the other variable. For example, to find fj(), note 
that for each x, the values of y such that f(x,y) > 0 form the interval [—-V1— x7, V1 — x?]. Then, for 
-l<2<l, 


fi(z) 


[ fleway 


Vi-a? 
/ ka y*dy 


—v 1-2 


y=—V 1-2 
= 2Qka?(1 — x?)9/? /3. 


The set in Fig. 3.12 is not rectangular, so X and Y are not independent. 
(a) Figure S.3.30 shows the region where f(x,y) > 0 as the union of two shaded rectangles. Although 


the region is not a rectangle, it is a product set. That is, it has the form {(z,y): «2 € A,y € B} 
for two sets A and B of real numbers. 


1.0 


0.8 


0.6 


0.4 


0.2 


0.0 


Figure $.3.30: Region of positive p.d.f. for Exercise 15a of Sec. 3.5. 
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(b) The marginal p.d.f. of X is 


fle) = f° fewdy= 


The marginal p.d.f. of Y is 


— ee + aur = 
= = —dz=1 
fo(y) i 3 x | 6 x ’ 


for 0 < y <1. The distribution of Y is the uniform distribution on the interval (0, 1]. 


ifl<a<3, 
if6<ar<8. 


DlRwlH 


(c) The product of the two marginal p.d.f.’s is 


; ifl<a2<3and0<y<l, 
filz)foly) =4 3 if6<a2<8and0<y<1, 
0 otherwise, 


which is the same as f(x,y), hence the two random variables are independent. Although the 
region where f(x,y) > 0 is not a rectangle, it is a product set as we saw in part (a). Although 
it is sufficient in Theorem 3.5.6 for the region where f(x,y) > 0 to be a rectangle, it is necessary 
that the region be a product set. Technically, it is necessary that there is a version of the p.d-f. 
that is strictly positive on a product set. For continuous joint distributions, one can set the p.d-f. 
to arbitrary values on arbitrary one-dimensional curves without changing it’s being a joint p.d_f. 


3.6 Conditional Distributions 


Commentary 


When introducing conditional distributions given continuous random variables, it is important to stress that 
we are not conditioning on a set of 0 probability, even if the popular notation makes it appear that way. 
The note on page 146 can be helpful for students who understand two-variable calculus. Also, Exercise 25 in 
Sec. 3.11 can provide additional motivation for the idea that the conditional distribution of X given Y = y 
is really a surrogate for the conditional distribution of X given that Y is close to y, but we don’t wish to 
say precisely how close. Exercise 26 in Sec. 3.11 (the Borel paradox) brings home the point that conditional 
distributions really are not conditional on the probability 0 events such as {Y = y}. 

Also, it is useful to stress that conditional distributions behave just like distributions. In particular, 
conditional probabilities can be calculated from conditional p.f.’s and conditional p.d.f.’s in the same way 
that probabilities are calculated from p.f.’s and p.d.f.’s. Also, be sure to advertise that all future concepts 
and theorems will have conditional versions that behave just like the marginal versions. 


Solutions to Exercises 


1. We begin by finding the marginal p.d.f. of Y. The set of 2 values for which f(x,y) > 0 is the interval 
[—(1 — y?)¥/2, (1 — y)!/2]. So, the marginal p.d.f. of Y is, for —1 < y <1, 
= /2 
(yy? ky? 4°" k | 
AG) =a ke tde= ny’ 3 — 821 — 9238/2, 
Slag 3 a=—(1—y?)1/2 


and 0 otherwise. The conditional p.d.f. of X given Y = y is the ratio of the joint p.d.f. to the marginal 
p.d.f. just found. 
Sa" 


gi(zly) = 4 2(1 — y?)3/2 
0 otherwise. 


for —(1—y?)/2 <a < (1—y?)1/2, 
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2. (a) We have Pr(Junior) = 0.04 + 0.20 + 0.09 = 0.33. Therefore, 


Pr(Junior and Never) 0.04 4 
Pr(Junior) aie aes 
(b) The only way we can use the fact that a student visited the museum three times is to classify the 
student as having visited more than once. We have 


Pr(More than once) = 0.04 + 0.04 + 0.09 + 0.10 = 0.27. 
Therefore, 


Pr(Never|Junior) = 


Pr(Senior and More than once) 
Pr(More than once) 

0.10 10 

0.27 27 


Pr(Senior|More than once) 


3. The joint p.d.f. of X and Y is positive for all points inside the circle S shown in the sketch. Since the 
area of S is 97 and the joint p.d.f. of X and Y is constant over S, this joint p.d.f. must have the form: 


1 
— for (x,y) €S, 
f(z,y)= 4 90 


0 otherwise. 


(1,1) 


(-5,1) 


Figure $.3.31: Figure for Exercise 3 of Sec. 3.6. 


It can be seen from Fig. $.3.31 that the possible values of X lie between —2 and 4. Therefore, for 
—2<2 <4, 


—24[9-(a-1)7]/? 4 2 
= 2 y= oe 
file) = [ rotenone Bg UT GOW 


(a) It follows that for —2 < x < 4 and —2— [9— (2 — 1)?]? <y < -24[9- (x-1)7]/?, 


i _ f(y) 1 (e111? 
poly 2) = FO = 19 @ 1717. 


(b) When X = 2, it follows from part (a) that 


1 
> dor Dae ey as 
gly|c=2)=4 2/8 


0 otherwise. 
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Therefore, 


-24+V8 2-2 
5 £ ° 


—24+/8 
Pr(¥ >0|X =2)= [ laa a= 
0 


4. (a) For0<y< 1, the marginal p.d.f. of y is 


: 1 
faly) = [ fay)ae=e(5 40). 
Therefore, for 0 << a <1and0< y< 1, the conditional p.d.f. of X given that Y = y is 


eee 
fey) 3+? 
It should be noted that it was not necessary to evaluate the constant c in order to determine this 


conditional p.d_f. 
(b) When Y = 1/2, it follows from part (a) that 


4 1 
1 5 (+3) for0O <a <1, 
m1 (sy = 5) =4 3 4 
0 otherwise. 


Therefore, 


1 1 3 1 1 
P xX = y = = = = d. =-, 
r( <5! >) [Pa(ely =) de = 
5. (a) The joint p.df. f(x,y) is given by Eq. (3.6.15) and the marginal p.d.f. fo(y) was also given in 
Example 3.6.10. Hence, for 0 << y<1and0< a2 < y, we have 
g(x|y) = =: 
foly) (1-2) log(1— y) 


(b) When Y = 3/4, it follows from part (a) that 


1 
3 ———. for0< x < 3, 
n(ely=4) = (1 — x) log 4 ’ 
0 otherwise. 
Therefore, 

1 2 3/4 3 log 4—log2 1 
Pr(X >=|Y=-]= = — | dx = —___—_ = -. 
r( > 3 i) [., n (ely 7 . log 4 2 


6. Since f(x,y) = 0 outside a rectangle with sides parallel to the x and y axes and since f(x,y) can be 
factored as in Eq. (3.5.7), with gi(x) = csin(«) and go(y) = 1, it follows that X and Y are independent 
random variables. Furthermore, for 0 < y < 3, the marginal p.d.f. fo(y) must be proportional to go(y). 
In other words, f2(y) must be constant for 0 < y < 3. Hence, Y has the uniform distribution on the 
interval [0,3] and 


1 
— for0<y<8, 

hy)= 4 3 
0 


otherwise. 


(a) Since X and Y are independent, the conditional p.d.f. of Y for any given value of X is the same 
as the marginal p.d.f. fo(y). 
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(b) Since X and Y are independent, 
2 1 
Pr(1 < ¥ <2|X =0.73) =Pr(l < Y <2) =i falu)dy = 5. 
1 


7. The joint p.d.f. of X and Y is positive inside the triangle S shown in Fig. $.3.32. It is seen from 
Fig. $.3.32 that the possible values of X lie between 0 and 2. Hence, for 0 < x < 2, 


y 


Figure $.3.32: Figure for Exercise 7 of Sec. 3.6. 


fils) = i f(x,y) dy = (2 =o). 


(a) It follows that forO<a2<2and0<y<4-2z, 


_ f(@y) _ 4-28 -y 
go(y |x) = fila) — 2(a2 — 2)2 . 


(b) When X = 1/2, it follows from part (a) that 


2 
1 ~(3-—y) for0<y <8, 
g2 (u |c= 5) = 5 
0 otherwise. 


Therefore, 


1 3 1 1 
Pely Soi xc=_) = a ae, 
i > 2| >) [a (uie 5) ay 5 


8. (a) The answer is 


t +t 
: i f(x,y) dx dy = 0.264. 
0 0.8 
(b) For 0 < y < 1, the marginal p.d.f. of Y is 


foly) = [ f(x,y) dx = (1 Be) 


Hence, forO <a<landO<y<1, 


(5) 2u + 3y 
x = . 
gee 1+ 3y 
When Y = 0.3, it follows that 
2 0.9 
oi(z|y = 0.3) one tor OU << 1. 
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Hence, 
1 
Pr(X > 0.8|¥ = 0.3) = | n(x|y = 0.3)de = 0.284. 
0.8 
(c) For 0 <a <1, the marginal p.d-f. of X is 
I 2 3 
fi(2) =, f(z,y)dy=—(2e+5). 
0 5) 2 
Hence, forO <a<landO<y<1, 


(y |) 2x2 + 3y 
xv) = . 
G2Y D 3 


When X = 0.3, it follows that 
0.643 
g2(y |x = 0.3) a for 0-< y= 1. 


Hence, 


1 
Pr(Y > 0.8] X = 0.3) = | ety l= 08) dy = 0314. 
0.8 


9. Let Y denote the instrument that is chosen. Then Pr(Y = 1) = Pr(Y = 2) = 1/2. In this exercise the 
distribution of X is continuous and the distribution of Y is discrete. Hence, the joint distribution of X 
and Y is a mixed distribution, as described in Sec. 3.4. In this case, the joint p.f./p.d.f. of X and Y is 


as follows: 
1 
~-2%=2 for y=land 0<a<1l, 
2 
= J 3 
f(z,y) = 5 38 = Sr for y=2and 0<2a<l, 
0 otherwise. 


(a) It follows that for 0 <a <1, 
2 3 
file) =O fey) = +50, 
y=l 


and f (2) = 0 otherwise. 


(b) For y= 1,2 and 0 <2 <1, we have 


PY = y|X =a) = gly 2) = SOW, 
Hence, 
1 it 
f (5.1) i 
1 4 4 8 
fi, aes 


10. Let Y = 1 if a head is obtained when the coin is tossed and let Y = 0 if a tail is obtained. Then 
Pr(Y =1|X = 2) = 2 and Pr(Y = 0|X = x) = 1--Z. In this exercise, the distribution of X is 
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continuous and the distribution of Y is discrete. Hence, the joint distribution of X and Y is a mixed 
distribution as described in Sec. 3.4. The conditional p.f. of Y given that X = x is 


x for y=1, 
g(y|t)=4 1-a for y=0, 
0 otherwise. 


The marginal p.d.f. f(a) of X is given in the exercise, and the joint p.f./p.d.f. of X and Y is f(z, y) 
= f1(x)go(y| ax). Thus, we have 


622(1—a2) for O<x<1land y=1, 
f(z,y) =< 62(1—2)? for O0<a<1land y=0, 
0) otherwise. 
Furthermore, for y = 0,1, 
1 
PY =y)=— 20) = ; f(x,y) de. 
Hence, 
1 1 1 
Pry =) =r 6x?(1 — x) dx = | (62? — 62°) dx = 5° 
0 0 
(This result could also have been derived by noting that the p.d.f. fj (a) is symmetric about the point 
x= 1/2.) 
It now follows that the conditional p.d.f. of X given that Y = 1 is, for0O <a <1, 


ic 6a27(1 — x) 


n@ly=)= Bay ray = 172 = 12¢7(1 —2z). 


11. Let F5 be the c.d.f. of Y. Since f is continuous at both yo and y;, we can write, for 7 = 0,1, 
Pr(Y € Ai) = Fa(yi + €) — Fo(yi — €) = 2€ fo(yj), 


where y/, is within € of y;. This last equation follows from the mean value theorem of calculus. So 


Pr(¥ € Ao) _ fa(yo) 


Pr(Y¥ © Ay) ~ falvi)’ ven 


Since fo is continuous, lim fo(y;) = fo(yi), and the limit of (S.3.1) is 0/fo(y1) = 0. 


12. (a) The joint p.f./p.d-f. of X and Y is the product fo(y)gi(zly). 


_ | yy exp(—3y)/2! if y> Cand ¢=0,1,..., 
F(@,9) = 0 otherwise. 


The marginal p.f. of X is obtained by integrating over y. 


file) = [PO exp(—ay)ay = 5 (5) 


3 
for x =0,1,.... 
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(b) The conditional p.d.f. of Y given X = 0 is the ratio of the joint p.f./p.d-f. to (0). 


_ (2y)° exp(—3y)/0! _ 
for y > 0. 
(c) The conditional p.d.f. of Y given X = 1 is the ratio of the joint p.f./p.d-f. to f,(1). 


(2y)* exp(—3y)/1! 
ey) = rat 
(1/3)(2/3)? 
for y > 0. 
(d) The ratio of the two conditional p.d.f.’s is 


ga(yll) _ 9yexp(—3y) _ By. 


g2(y|0) — 3exp(—3y) 
The ratio is greater than 1 if y > 1/3. This corresponds to the intuition that if we observe more 
calls, then we should think the rate is higher. 


13. There are four different treatments on which we are asked to condition. The marginal p.f. of treatment 
Y is given in the bottom row of Table 3.6 in the text. The conditional p.f. of response given each 
treatment is the ratio of the two rows above that to the bottom row: 


gel) = 1 oceans rae 
gi(zl2) = cae ot 
nis) = { f-nazt tee 
vii = { Egfeome es 


The fourth one looks quite different from the others, especially from the second. 


3.7 Multivariate Distributions 


Commentary 


The material around Definition 3.7.8 and Example 3.7.8 reintroduces the concept of conditionally independent 
random variables. This concept is important in Bayesian inference, but outside of Bayesian inference, it 
generally appears only in more advanced applications such as expert systems and latent variable models. 
If an instructor is going to forego all discussion of Bayesian inference then this material (and Exercises 13 
and 14) could be skipped. 


Solutions to Exercises 


1. (a) We have 


1 pl el 
| | | F (24; dg, x3) dx dx dx3 = 3c. 
0 YO YO 


Since the value of this integral must be equal to 1, it follows that c = 1/3. 
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(b) For 0<a < land 0< 23 < 1, 


(x1 + 1+ 32). 


wir 


1 
fis(z1, #3) =[ f (#1, 22,23) dra = 


(c) The conditional p.d.f. of x3 given that x; = 1/4 and x2 = 3/4 is, for 0 < x3 < 1, 


93 @ 


Therefore, 
1 1 3 a a ee 5 
P X: — X: => X: = — — — =—, 
7 ( = | ogee 3) [ (3 7 5) dra = 73 


1 
First, integrate over 71. We need to compute | ea era —2 1)? *2-*3 dz. The two exponents 


i; 3 
1 *) f ars a 
pa=G et = £ 


Ty 


i) 
— 
© 
en al 


always add to 4 and each is always at least 1. So the possible pairs of exponents are (1,3), (2,2), 
and (3,1). By the symmetry of the function, the first and last will give the same value of the 
integral. In this case, the values are 


1 
Deeg a ee 
[ celzj — xj |dr1 a 5 ay (S.3.2) 


In the other case, the integral is 


1 
2 3 4 c 2 ¢ c 

—2 dzyj =--—+-=—. §.3.3 

[ ele? - 20} + afldes = $-F+E= 5 (8.3.3) 

Finally, sum over the possible (22,73) pairs. The mapping between (x2,73) values and the expo- 


nents in the integral is as follows: 


Summing over the four possible (2,73) pairs gives the sum of c/6, so c = 6. 


(b) The marginal joint p.f. of (X2, X3) is given by setting c = 6 in (8.3.2) and (8.3.3) and using the 
above table. 


lin aan 0 2 BM reime) S 10 stls yh 
23\7273)—) 0.9 if (@2,as) € {(1,0), (0, 1)}. 


(c) The conditional p.d.f. of X; given X2 = 1 and X3 = 1 is 1/0.3 times the joint p.f./p.d.f. evaluated 
at m2 = 273 = 1: 


1 
0 


gi(#i|1,1) = 0 otherwise. 


3. The p.d.f. should be positive for all 2; > 0 not just for all x; > 1 as stated in early printings. This will 
match the answers in the back of the text. 
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(a) We have 


(oe) (oe) (oe) 1 
| i : f (#1, £2, 23) dx, dr2 dx3 = =c. 
0 Jo Jo 6 


Since the value of this integral must be equal to 1, it follows that c = 6. If one used x; > 1 instead, 
then the integral would equal exp(—6)/6, so that c = 6 exp(6). 


(b) For ay > 0, a > 0, 


fi3(21, £3) [ f (x1, 22,23) dxg = 3exp[—(a1 + 323)]. 


If one used x; > 1 instead, then for x; > 1 and x3 > 1, f13(%1, 73) = 3exp(—21 — 343 + 4). 


It is helpful at this stage to recognize that the random variables X,, X2, and X3 are independent 
because their joint p.d.f. f (21, 22,273) can be factored as in Eq. (3.7.7); i.e., for 2; > 0 (4 = 1, 2,3), 


rae 
io) 
wa 


f (x1, 22, 23) = (exp(—21)) = (2exp(—z2))(3 exp(—z3)). 
It follows that 
1 1 1 
Pr xy <1 | Xo 2,45 = 1) = Pry <1) -| Filea) day a exp(—21)dr1 =1- = 
0 0 


This answer could also be obtained without explicitly using the independence of X1, X2, and X3 
by calculating first the marginal joint p.d_f. 


f23(x2, v3) =[ Ff (182,03) dni, 


then calculating the conditional p.d-f. 


Fi Dix 2,1) 
@1| rq = 2,23 = 1) = ————~, 
gil 1| 2 3 ) fo3(2,1) 


and finally calculating the probability 
1 
Prixy < 1| Xo = 2,X3 = 1) = gi(x1 | x2 = 2,23 = 1)dzy. 
0 
If one used x; > 1 instead, then the probability in this part is 0. 


4. The joint p.d.f. f(x1, 22,23) is constant over the cube S. Since 


1 -l a 
| dxy dx9 dx3 = i i i dx, dx9 dx3 = 1, 
‘S 0 JO JO 


it follows that f(21, 22,23) = 1 for (a1, 22,23) € S. Hence, the probability of any subset of S will be 
equal to the volume of that subset. 


(a) The set of points such that (21; — 1/2)? + (x2 — 1/2)? + (a3 — 1/2)? < 1/4 is a sphere of radius 1/2 
with center at the point (1/2,1/2,1/2,). Hence, this sphere is entirely contained with in the cube 
S. Since the volume of any sphere is 47r°/3, the volume of this sphere, and also its probability, is 
Ar(1/2)?/3 = 1/6. 

(b) The set of points such that cf + 73 + 73 < 1 is a sphere of radius 1 with center at the origin (0, 0, 
0). Hence, the volume of this sphere is 47/3. However, only one octant of this sphere, the octant 
in which all three coordinates are nonnegative, lies in S. Hence, the volume of the intersection of 


1 4 
the sphere with the set S, and also its probability, is mi 3” a 6 
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5. (a) The probability that all n independent components will function properly is the product of their 
n 


individual probabilities and is therefore equal to II Die 
i=1 
(b) The probability that all n independent components will not function properly is the product of 
n 


their individual probabilities of not functioning properly and is therefore equal to pie —p;). The 


i=l 
nm 
probability that at least one component will function properly is 1 — I[a — pj). 
i=1 
6. Since the n random variables 71,...,%p arei.i.d. and each has the p-f. f, the probability that a particular 


variable X; will be equal to a particular value x is f(x), and the probability that all n variables will 
be equal to a particular value x is [f(x)]”. Hence, the probability that all n variables will be equal, 
without any specification of their common value, is }>,[f(x)|”. 


7. The probability that a particular variable X; will lie in the interval (a, b) is p = nike f(x) dx. Since the 
variables X1,...,X, are independent, the probability that exactly i of these variables will lie in the 
interval (a, b) is (‘) p'(1—p)"*. Therefore, the required probability is 


3 ("Jora — pyr. 


i=k 


8. For any given value x of X, the random variables Yj,...,Y, are ii.d., each with the p.d.f. g(y|z). 
Therefore, the conditional joint p.d.f. of Y;...,Y, given that X = x is 


1 
—= tor) << 97,0 = lu. 
gn 


A(yi,---,Yn|e) = 9(y1|z)---9(Yn |x) = 
0 otherwise. 


The joint p.d.f. of X and Y,,...,Y;, is, therefore, 


1 

—exp(—z) for 0<y<2 (i=1,...,n), 
f(z)h(y,--..yn|2)=4 2 i 

0 otherwise. 


This joint p.d.f. is positive if and only if each y; > 0 and «& is greater than every y;. In other words, x 
must be greater than m = max{y1,...,Yn}- 


(a) For y; > 0 (¢=1,...,n), the marginal joint p.d.f. of Y1,...,Y, is 


goluis-+stn) = f 


=O: 


oe) 


TQ Vises, alee = [ ~ exp(—2) dz = ~ exp(—m). 


(b) For y; > 0 (¢=1,...,n), the conditional p.d.f. of X given that Y; = y;(i = 1,...,n) is 


f(x)h(y1,- ++, Yn | 2) exp(—(a—m)) for « >m, 


G12 | Yiyss350e) =  Go(Yty---5Yn) i 0 otherwise. 


9. (a) Since X; = X for i = 1,2, we know that X; has the same distribution as X. Since X has a 
continuous distribution, then so does X; for 7 = 1,2. 


(b) We know that Pr(X1; = Xq) =1. Let A = {(21, 22) : 21 = xo}. Then Pr((X1,X2) € A) =1. 


However, for every function f, f (a1, 22)dx;dxz = 0. So there is no possible joint p.d-f. 
A 
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10. 


11. 


12. 


13. 


Chapter 3. Random Variables and Distributions 


The marginal p.d.f. of Z is 2exp(—2z), for z > 0. The coordinates of X are conditionally i.i.d. given 
Z = z with p.d.f. zexp(—zz), for x > 0. This makes the joint p.d.f. of (Z, X) equal to 2z° exp(—z[2 + 
x1 +-+++25]) for all variables positive. The marginal joint p.d.f. of X is obtained by integrating z out 
of this. 


240 
re for all x; > 0. 


fo(w) = I 22° exp(—2[2 +21 +--+ + 25])dz = Q+a,+- 425) 


[o-e) 

Here, we use the formula i. y* exp(—y)dy = k! from Exercise 12 in Sec. 3.6. The conditional p.d-f. of 
0 

Z given X = (21,...,25) is then 


(24+ a+ a5)" 


a(z|2) = 130 2° exp(—2[2 + 21 +---+25]), 


for z > 0. 


Since Xj,..., X» are independent, their joint p.f., p.d.f., or p.f./p.d.f. factors as 


f (aA, tee iba = fi(x1) mith hea) 


where each f; is a p.f. or p.d.f. If we sum or integrate over all x; such that j ¢ {i1,...,7,} we obtain 
the joint p.f., p.d.f., or p.f./p.d.f. of X;,,...,X;, equal to fi, (vi,)--- fi, (vi), which is factored in a way 
that makes it clear that X;,,...,Xj;, are independent. 


Let h(y,w) be the marginal joint p.d.f. of Y and W, and let h2(w) be the marginal p.d.f. of w. Then 


h(y, w) — [ fy, w)de. 
now) = ff fly,z,w)dedy, 
ntyziw) = Ree) 
antyjw) = Ae) - LAAN — fonty, zhw)ae. 


Let f (21, 22,23, z) be the joint p.d.f. of (X1, X2, X3, Z). Let fi2(x1,22) be the marginal joint p.d.f. of 
(X1, X2). The the conditional p.d.f. of X3 given (X1, X2) = (11,22) is 


Lflenents.2)de _ [o(eil)olerlolesle)fle)de _ f g(,4).)9erl2oele) 2) 


fi2(x1, 2) fi2(r1, £2) fi2(x1, £2) 
According to Bayes’ theorem for random variables, the fraction in this last integral is go(z|v1, 22). Using 
the specific formulas in the text, we can calculate the last integral as 
7 1 3,2 
| zexp(—2i3)5 (2 +21 +%)°2* exp(—2(2+ 21 + £2))dx 
0 


2+4, +22)? f° . 
= erated f 2 exp(—z(2 + 21 + to + 3))dz 
0 


2 
(2+ 21 + x2)3 6 _ 3(2+ 21 +22) 
2 (2+ai+a2+%3)¢ (2+21+22+23)* 
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The joint p.d.f. of (X1, X2, X3) can be computed in a manner similar to the joint p.d.f. of (X 1, X2) and 
it is 


favs 12 
3(21, £2, 23) = ———_—__. 
123(%1, %2, %3 (2 ; ; 3) 


The ratio of f123(%1, 22,23) to fi2(#1, x2) is the conditional p.d.f. calculated above. 


14. (a) We can substitute 2; = 5 and x2 =7 in the conditional p.d.f. computed in Exercise 13. 


3(2+5+4+ 7)3 8232 
g3(@3|5, 7) = @+5+7423"  G44a5)" 
for r3 > 0. 
(b) The conditional probability we want is the integral of the p.d.f. above from 3 to oo. 
[ a = eee i = 0.5585. 
3 (14+ 23)4 (14 + 23) ee 


In Example 3.7.9, we computed the marginal probability Pr(X3 > 3) = 0.4. Now that we have 
observed two service times that are both longer than 3, namely 5 and 7, we think that the 
probability of X3 > 3 should be larger. 


15. Let A be an arbitrary n-dimensional set. Because Pr(W = c) = 1, we have 


ce | PE Mize g Ay) CA) eae 
Pigs) Se 0 otherwise. 
It follows that 


Pr((Xine-yXn) € IW =u) =| Pr(Xjscrxyp An) ECA) we, 


0 otherwise. 
Hence the conditional joint distribution of X),...,X, given W is the same as the unconditional joint 
distribution of X1,...,X», which is the distribution of independent random variables. 


3.8 Functions of a Random Variable 


Commentary 


A brief discussion of simulation appears at the end of this section. This can be considered a teaser for the 
more detailed treatment in Chapter 12. Simulation is becoming a very important tool in statistics and applied 
probability. Even those instructors who prefer not to cover Chapter 12 have the option of introducing the 
topic here for the benefit of students who will need to study simulation in more detail in another course. 

If you wish to use the statistical software R, then the function runif will be most useful. For the purposes 
of this section, runif (n) will return n pseudo-uniform random numbers on the interval [0, 1]. Of course, either 
n must be assigned a value before expecting R to understand runif (n), or one must put an explicit value 
of n into the function. The following two options both produce 1,000 pseudo-uniform random numbers and 
store them in an object called unumbs: 


e unumbs=runif (1000) 
e n=1000 


unumbs=runif (n) 
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Solutions to Exercises 


1. The inverse transformation is 2 = (1—y)!/?, whose derivative is —(1—y)~1/?/2. The p.d.f. of Y is then 


o(y) = F(0— of?) ~ 9) 7/2 = 50 w)"?, 


for0O<y<l. 


2. For each possible value of 2, we have the following value of y = 2? — x: 


0 2 
7 
9 2 
7 
6 2 
7 
12 : 
7 


3. It is seen from Fig. 8.3.33 that as x varies over the interval 0 < x < 2,y varies over the interval 


Ay 


(1,1) 


Figure §$.3.33: Figure for Exercise 3 of Sec. 3.8. 
O0<y<1. Therefore, forO<y <1, 


Gy) = Pr¥ <y)=Pr[X(2-X) <y] = Pr(x* — 2X > -y) 
= Pr(x?—2x% 41> 1—y) =Pri((x — 17 > 1—y] 
= Prix —1<—)/1 —y) + Prix —1S4/1=y) 

= Pr(X <1-J/l—-y)+Pr(X 2>14+VJ1-y) 


1-VI-y ] 2 1 
= =x dx +/ =x dx 
0 2 1+VI=y 2 
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= 1-vl-y. 
It follows that, for 0 < y < 1, 
dG(y) 1 


gy) = a ~ 0 -y 


. The function y = 4 — x? is strictly decreasing for 0 < « < 2. When x = 0, we have y = 4, and when 
x = 2 we have y = —4. Therefore, as x varies over the interval 0 < x < 2, y varies over the interval 


—4 <y <4. The inverse function is 2 = (4— y)!/3 and 
dz 1 
BP A aH 2/8. 
Hy Be 
Therefore, for —4 < y < 4, 
dx 1 il 
= tA 9 JF) = S4- 1/3 “(4 y)-2/8 = 
ay) = MA- wy" Va) = 54 —y) 5 ( y)- 6d — yi 


. If y= ax +, the inverse function is z = (y — b)/a and dx/dy = 1/a. Therefore, 


1 y—b 
= Ter 
| a a 
. X lies between 0 and 2 if and only if Y lies between 2 and 8. Therefore, it follows from Exercise 3 that 
for2<y <8, 


ay) =i (2 = oa gu) 


(a) If y = 2°, then as x varies over the interval (0,1), y also varies over the interval (0,1). Also, 
a = y!/? aid dx/dy = y~'/?/2. Hence, for 0 < y <1, 


dx 1 1 
= 1/2 = ae, —1/2_ = —1/2 
gy) = fy aa = leoe sy 


(b) If y = —a, then as x varies over the interval (0,1), y varies over the interval (—1,0). Also, 
a = —y"/3 and dx/dy = —y~?/3/3. Hence, for -1 < y <0, 


gy) = f-¥)|F 


(c) If y = a'/?, then as x varies over the interval (0,1), y also varies over the interval (0,1). Also, 
x = y’ and dx/dy = 2y. Hence, for 0 < y < 1, g(y) = f(y?)2y = 2y. 


dx 
dy 


atu) =F [=u 9) 


=5lyl”. 


. As x varies over all positive values, y also varies over all positive values. Also, x = y? and da/dy = 2y. 
Therefore, for y > 0, 


g(y) = f(y?)(2y) = 2y exp(—y’). 


. The c.d.f. G(y) corresponding to the p.d.f. g(y) is, for 0 < y < 2, 


y y 1 
Ci = | FOr | 32g — 243, 
0 0 8 8 


We know that the c.d.f. of the random variable Y = G~!(X) will be G. We must therefore determine 
the inverse function G—!. If X = G(Y) = Y3/8 then Y = G-1(X) = 2X". It follows that Y = 2X1/3, 
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10. 


11. 


12. 


13. 


14. 


15. 
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For 0 < x < 2, the c.d.f. of X is 
F(z) =i f (tat =, =i ip at 
0 0 2 4 


Therefore, by the probability integral transformation, we know that U = F(X) = X?/4 will have 
the uniform distribution on the interval [0,1]. Since U has this uniform distribution, we know from 
Exercise 8 that Y = 2U'!/% will have the required p.d.f. g. Therefore, the required transformation is 
Yao S20 (rae. 


We can use the probability integral transformation if we can find the inverse of the c.d.f. The c.d.f. is, 
forO<y <1, 


Gy) = [lat = 5 [t+ dat =F? +y). 


The inverse of this function can be found by setting G(y) = p and solving for y. 


—1+ (1+ 8p)'/? 


1 
“(yy +y)=p; y*+y—-2p=0; y= 5 


2 


So, we should generate four independent uniform pseudo-random variables P;, P2, P3, Py, and let Y; = 
[1+ 4-8P)"/2)/2 for ¢ = 1,2, 3,4. 


Let X have the uniform distribution on [0,1], and let F be ac.d.f. Let F~'(p) be defined as the smallest 
x such that F(x) > p. Define Y = F~'(X). We need to show that Pr(Y < y) = F(y) for all y, First, 
suppose that y is the unique x such that F(x) = F(y). Then Y < y if and only if X < F(y). Since 
X has a uniform distribution Pr(X < F(y)) = F(y). Next, suppose that F(x) = F'(y) for all x in the 
interval [a,b) of [a,b] with b > a, and suppose that F(x) < F(y) for all x < a. Then F-!(X) < y if 
and only if X < F(a) = F(y). Once again Pr(X < F(y)) = F(y). 


The inverse transformation is z = 1/t with derivative —1/t?. the p.d-f. of T is 
g(t) = f(1/t)/t? = 2exp(—2/t)/t?, 
for ¢ > U: 


Let Y = cX +d. The inverse transformation is c = (y — d)/c. Assumethat c > 0. The derivative of 
the inverse is 1/c. The p.d.f. of Y is 


gy) = Fly — d]/c)/c = [e(b—a)]™, for a < (y—d)/e <b. 


It is easy to see that a < (y—d)/c < bif and only if ca+d< y < cb+d, so g is the p.d_-f. of the uniform 
distribution on the interval [ca + d,cb +d]. If c < 0, the distribution of Y would be uniform on the 
interval [cb + d,ca+d]. If c= 0, the distribution of Y is degenerate at the value d, i.e., Pr(Y = d) = 1. 


Let F' be the c.d.f. of X. First, find the c.d.f. of Y, namely, for y > 0, 
Pr(¥ < y) = Pr(X? <y) = Pr(-y'/? < X <y'”) = Fly? — F(-y'””). 
Now, the p.d.f. of Y is the derivative of the above expression, namely, 


FO) AGE") 
Qy1/2 Qy1/2 . 


roe =P") — F(-y"?)] = 


This equals the expression in the exercise. 
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16. Because 0 < X < 1 with probability 1, squaring X produces smaller values. There are wide intervals 
of values of X that produce small values of X? but the values of X that produce large values of X? are 
more limited. For example, to get Y € [0.9, 1], you need X € [0.9487, 1], whereas to get Y € [0,0.1] (an 
interval of the same length), you need X € [0, 0.3162], a much bigger set. 


17. (a) According to the problem description, Y = 0 if X < 100, Y = X — 100 if 100 < X < 5100, and 
Y = 5000 if X > 5100. So, Y = r(X), where 


0 if x < 100, 
r(v)=¢ x—100 if 100 <x < 5100, 
5000 = if z > 5100. 


(b) Let G be the c.d.f. of Y. Then G(y) = 0 for y < 0, and G(y) = 1 for y > 5000. For 0 < y < 5000, 


Pry sy) Pr(r(X) < y) 


= Pr(X < y+ 100) 


y+100 dy 
7 [ (1+)? 


1 
= 1- . 
y+ 101 
In summary, 
0 ify <0, 
G(y) = big if 0 < y < 5000, 
1 if y > 5000. 


(c) There is positive probability that Y = 5000, but the rest of the distribution of Y is spread out in 
a continuous manner between 0 and 5000. 


3.9 Functions of Two or More Random Variables 


Commentary 


The material in this section can be very difficult, even for students who have studied calculus. Many textbooks 
at this level avoid the topic of general bivariate and multivariate transformations altogether. If an instructor 
wishes to avoid discussion of Jacobians and multivariate transformations, it might still be useful to introduce 
convolution, and the extremes of a random sample. The text is organized so that these topics appear early 
in the section, before any discussion of Jacobians. In the remainder of the text, the method of Jacobians is 
used in the following places: 


e The proof of Theorem 5.8.1, the derivation of the beta distribution p.d_f. 
e The proof of Theorem 5.10.1, the derivation of the joint p.d.f. of the bivariate normal distribution. 


e The proof of Theorem 8.3.1, the derivation of the joint distribution of the sample mean and sample 
variance from a random sample of normal random variables. 


e The proof of Theorem 8.4.1, the derivation of the p.d.f. of the t distribution. 
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Solutions to Exercises 


1. The joint p.d.f. of X; and X92 is 


1 for0 <2, <1,0< 29 <1, 
0 otherwise. 


f (x1, £2) = 
By Eq. (3.9.5), the p.d.f. of Y is 


ay) = [fy - 22d 


The integrand is positive only for 0 < y—z<1land0<z< 1. Therefore, for 0 < y < 1 it is positive 
only for 0 < z < y and we have 


gy) = fo 1-de=y. 


For 1 < y < 2. the integrand is positive only for y—1< z< 1 and we have 


2. Let f be the p.df. of Y = X , + X2 found in Exercise 1, and let Z = Y/2. The inverse of this 
transformation is y = 2z with derivative 2. The p.d.f. of Z is 


Az for 0 <2 < 1/2, 
oz) =2f@z)=¢ 4(1=—2) tor l/2<2< 1, 
0 otherwise. 


3. The inverse transformation is: 


tw, = Yi; 
zr = ye/y, 
z3 = y3/ye- 


Furthermore, the set S where 0 < x; < 1 for i = 1,2,3 corresponds to the set J’ where 0 < y3 < yo < 
y1 <1. We also have 


Oni, Cal, Gal 1 0 0 

Oy, Oy2 Oy3 i 

Oza Or, 0 yp i 
Ja—det| 22 Om 22 | _ ge | -SG OO COOP ee 

Oy, Oy2 Oy3 Yo YI Y1y2 

0x3 0x3 0x3 0 ae A 

Oy, Oy2 Oy3 Y2 Y2 


Therefore, for 0 < y3 < yo < yi < 1, the joint p.d.f. of Y1, Yo, and Y3 is 


g(Y1, Y2, ¥3) 


| 
SYS 
wos 
Is 
SIS 
ee 
= 


ye 
Y1 Y2 Y1y2 Y1y2- 
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4. As a convenient device, let Z = X,. Then the transformation from X; and X» to Y and Z is a one- 
to-one transformation between the set S where 0 < x1 < 1 and 0 < x2 < 1 and the set T where 
O<y<z< 1. The inverse transformation is 


Ti, = &, 
oy 
w2 = -. 
z 
Therefore, 
_ 7] z _ Se 
J = det bre On: = det 1 St: 
dy Oz 2 2 


For 0<y<z< 1, the joint p.d-f. of Y and Z is 


y y\ (1 
so -1(o8)i=(-+4)(. 
2 z/ \z 
It follows that for 0 < y < 1, the marginal p.d.f. of Y is 
1 
oily) = f glu,2)d2 = 201-9), 
y 


5. As a convenient device let Y = X»y. Then the transformation from X, and X» to Y and Z is a one- 
to-one transformation between the set S where 0 < x1 < 1 and 0 < x2 < 1 and the set T where 
O<y<land0< yz <1. The inverse transformation is 


TT] = Ye, 
TQ = Y. 
Therefore, 


= | a 
J = det |F Ae y. 


The region where the p.d-f. of (Z,Y) is positive is in Fig. $.3.34. For 0 < y < 1 and0 < yz <1, the 


Figure $.3.34: The region where the p.d.f. of (Z, Y) is positive in Exercise 5 of Sec. 3.9. 


joint p.d.f. of Y and Z is 


gy, 2) = f(yz,y)|FJ] = (yz + y)(y). 
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It follows that for 0 < z <1, the marginal p.d.f. of Z is 


ale) = f oly.2)dy = 5 + D. 


Also, for z > 1, 


g2(z) =| gly, 2) dy = sala te), 


6. By Eq. (3.9.5) (with a change in notation), 
Co 
(2) = | f(z-t,t)dt for -c<z<o. 
—oo 


However, the integrand is positive only for 0 < z—t<t< 1. Therefore, for 0 < z < 1, it is positive 
only for z/2 <t < z and we have 


g2)= 22 dt = 27. 
z/2 


For 1 < z < 2, the integrand is positive only for z/2 < ¢t < 1 and we have 


gz) = 2z dt = z(2 — 2). 
2/2 
7. Let Z = —Xp. Then the p.d_-f. of Z is 


_ J} exp(z) for z <0, 
fal2) = 0 for z > 0. 


Since X, and Z are independent, the joint p.d.f. of X; and Z is 


_ J exp(-(@-—2z)) forz>0,z<0, 
[ise 0 otherwise. 


It now follows from Eq. (3.9.5) that the p.d.f. of Y = X, — Xo = X14 Z is 
[oe] 
ay) =f fy 22)az. 
—oo 
The integrand is positive only for y— z > 0 and z < 0. Therefore, for y < 0, 


gly) = i. exp(—(y — 2z)) dz = 5 oxp(y) 


—co 


Also, for y > 0, 
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. We have 
Pr(Yp > 0.99) = 1—Pr(¥%, < 0.99) 
= 1-—Pr(All n observations < 0.99) 
1 — (0.99)". 


Next, 1 — (0.99)” > 0.95 if and only if 


(0.99)” < 0.05 or n  log(0.99) < log(0.05) 
log (0.05) 


OB) ns 298.1. 
"= 168(0.99) 


or 


So, n > 299 is needed. 


. It was shown in this section that the joint c.d.f. of Yj and Y, is, for —co < yt < Yn <o, 


G(y1, Yn) = [F(n)]" — [LF n) — Fy)”. 
Since F'(y) = y for the given uniform distribution, we have 
Pr(Y¥; < 0.1, ¥, < 0.8) = G(0.1,0.8) = (0.8)" — (0.7). 
Pr(¥;, < 0.1 and Y,, > 0.8) 
= Pr < 0.1) — Pr, < 0.1 and Y, < 0:8). 
It was shown in this section that the p.d.f. of Y; is 
Gify) =1-[L- FI". 
Therefore, Pr(Y < 0.1) = Gi (0.1) = 1 — (0.9)". Also, by Exercise 9, 
Pry = O01 and ¥, < 0.8) =(0.8)" — (0.7). 
Therefore, 
Pr(¥; < 0.1 and Y, > 0.8) =1- (0.9)" — (0.8)" + (0.7)”. 


The required probability is equal to 


1 1 n n 
Pr (au m observations < 5) +Pr (au nm observations > *) = (5) + (5) . 


This exercise could also be solved by using techniques similar to those used in Exercise 10. 


The p.d.f. hi(w) of W was derived in Example 3.9.8. Therefore, 


1 
PriW >0.9) = [. hi(w)dw = s n(n — 1)w"~7(1 — w)dw 
1 — n(0.9)""! + (n — 1)(0.9)”. 
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13. 


14. 


15. 


16. 
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If X has the uniform distribution on the interval [0, 1], then aX +6 (a > 0) has the uniform distribution 
on the interval [b,a + 6]. Therefore, 8X — 3 has the uniform distribution on the interval [—3,5]. It 
follows that if X,,...,X, form a random sample from the uniform distribution on the interval [0, 1], 
then the n random variables 8X, — 3,...,8X, — 3 will have the same joint distribution as a random 
sample from the uniform distribution on the interval [—3, 5]. 


Next, it follows that if the range of the sample X1,...,X, is W, then the range of the sample 8X, — 
3,...,8X,—3 will be 8W. Therefore, if W is the range of a random sample from the uniform distribution 
on the interval [0,1], then Z = 8W will have the same distribution as the range of a random sample 
from the uniform distribution on the interval [—3, 5]. 


The p.d.f. h(w) of W was given in Example 3.9.8. Therefore, the p.d.f. f(z) of Z = 8W is 


Gan (ad), 


for -3<2< 5. 


Ka) 

F sie 
x 
Sh 
II 


This p.d.f. g(z) could also have been derived from first principles as in Example 3.9.8. 


Following the hint given in this exercise, we have 


G(y) 


Pr(At least nm — 1 observations are < y) 


Pr(Exactly n — 1 observations are < y) + Pr(All n observations are < y) 
ny" *(1—y) +y" = ny" — (n — Dy”. 


Therefore, for 0 < y <1, 


gy) = n(n—1)y" 


It is a curious result that for this uniform distribution, the p.d.f. of Y is the same as the p.d.f. of the 
range W, as given in Example 3.9.8. There actually is intuition to support those two distributions 
being the same. 


For any n sets of real numbers Ay,,...,A,, we have 


Pr(¥, € Aj,...,¥n € An) Pr [r1(X1) € Ai,.--,Tn(Xn) € An] 
Pr [r1(X1) € Ay]... Pr[rn(Xn) € An] 


Pr(¥; € Ai)... Pr(Y¥; € An). 


Therefore, Y1,..., Y;, are independent by Definition 3.5.2. 


If f factors in the form given in this exercise, then there must exist a constant c > 0 such that the 
marginal joint p.d.f. of X; and X92 is 


fi2(x1, £2) _ cg(21, £2) for (21,22) — R?, 
the marginal joint p.d.f. of X3,X4, and Xz is 
1 3 
f345 (3, 24,25) = qhlta, £4, 25) for (93, %a;a5) ER, 
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and, therefore, for every point (x, ...,25) € R° we have 


f(ai,---,%5) = fir(x1, 22) faas (x3, £4, £5). 


It now follows that for any sets of real numbers A, and Ao, 


Pr(¥, € A; and ¥2 € Ap) =f... / Piso WS) OE acs Oty 


bos. and 
r2(w3,04,05)EA2 


= / fi2(%1, v2) dary gal / f345(%3, £4, 25) dx3 dx4 drs 


r1(a1,@2)EA1 r2(x3,04,05)€A2 


= = Pry) € A;)Pr(Y¥2 € Ag). 


Therefore, by definition, Y; and Y2 are independent. 
17. We need to transform (X,Y) to (Z,W), where Z = XY and W =Y. The joint p.d-f. of (X,Y) is 


(co a aii ifx>0, 


0 otherwise. 


The inverse transformation is x = z/w and y = w. The Jacobian is 


=a ( 1" He as 


0 1 w 
The joint p.d.f. of (Z,W) is 
g(z,w) = f(z/w, w)/w = wexp(—z) fo(w)/w = exp(—z) fo(w), for z > 0. 
This is clearly factored in the appropriate way to show that Z and W are independent. Indeed, if we 


integrate g(z,w) over w, we obtain the marginal p.d.f. of Z, namely gi(z) = exp(—z), for z > 0. This 
is the same as the function in (3.9.18). 


18. We need to transform (X,Y) to (Z,W), where Z = X/Y and W =Y. The joint p.d-f. of (X,Y) is 


_ J 38x? foly)/y*® if >0, 
f(t,y) = 0 otherwise. 


The inverse transformation is x = zw and y = w. The Jacobian is 


The joint p.d.f. of (Z,W) is 
g(z,w) = f(zw,w)w = 32? w? fo(w)w/w? = 327 fo(w), forO<a@ <1. 


This is clearly factored in the appropriate way to show that Z and W are independent. Indeed, if we 
integrate g(z,w) over w, we obtain the marginal p.d.f. of Z, namely g1(z) = 327, for 0 <2 <1. 
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19. This is a convolution. Let g be the p.d.f. of Y. By (3.9.5) we have, for y > 0, 


ay) = f fy-2)F (dx 
y 


| e” Ye *dz 
0 


= ye", 


Clearly, g(y) = 0 for y < 0, so the p.d.f.of Y is 


_jsge*? fory >t, 
gy) = 0 otherwise. 


20. Let f; stand for the marginal p.d.f. of X,, namely fi(x) = f f(x, 22)dx2. With ag = 0 and a; =a in 
(3.9.2) we get 


atu) =f t(2 22) aes 
a(S) 


which is the same as (3.8.1). 


21. Transforming to Z; = X,/X_ and Z, = Xj has the inverse Xj = Z) and X_ = Z2/Z,. The set of 
values where the joint p.d.f. of Z, and Z> is positive is where 0 < z2 < 1 and 0 < 29/z; < 1. This can 
be written as 0 < zg < min{1, z:}. The Jacobian is the determinant of the matrix 


0 1 
—29/22 1fz }’ 


which is |z2/z?|. The joint p.d.f. of Z,; and Zz is then 


for 0 < z < min{1, 2}. Integrating z2 out of this yields, for z, > 0, 


in{l 3 

min{1,z1} ze 
A dz 

0 al 


gi(21) 


_ min{z;,1}4 
= F 


Zy if z} <1, 
a? ay >, 


This is the same thing we got in Example 3.9.11. 
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3.10 Markov Chains 


Commentary 


Instructors can discuss this section at any time that they find convenient or they can omit it entirely. 
Instructors who wish to cover Sec. 12.5 (Markov chain Monte Carlo) and who wish to give some theoretical 
justification for the methodology will want to discuss some of this material before covering Sec. 12.5. On 
the other hand, one could cover Sec. 12.5 and skip the justification for the methodology without introducing 
Markov chains at all. 

Students may notice the following property, which is exhibited in some of the exercises at the end of this 
section: Suppose that the Markov chain is in a given state s; at time n. Then the probability of being in a 
particular state s; a few periods later, say at time n+ 3 or n+ 4, is approximately the same for each possible 
given state s; at time n. For example, in Exercise 2, the probability that it will be sunny on Saturday is 
approximately the same regardless of whether it is sunny or cloudy on the preceding Wednesday, three days 
earlier. In Exercise 5, for given probabilities on Wednesday, the probability that it will be cloudy on Friday is 
approximately the same as the probability that it will be cloudy on Saturday. In Exercise 7, the probability 
that the student will be on time on the fourth day of class is approximately the same regardless of whether 
he was late or on time on the first day of class. In Exercise 10, the probabilities for n = 3 and n = 4, are 
generally similar. In Exercise 11, the answers in part (a) and part (b) are almost identical. 

This property is a reflection of the fact that for many Markov chains, the nth power of the transition 
matrix P” will converge, as n — oo, to a matrix for which all the elements in any given column are equal. 
For example, in Exercise 2, the matrix P” converges to the following matrix: 


wir wip 
wir wire 


This type of convergence is an example of Theorem 3.10.4. This theorem, and analogs for more com- 
plicated Markov chains, provide the justification of the Markov chain Monte Carlo method introduced in 
Sec. 12.5. 

Solutions to Exercises 


1. The transition matrix for this Markov chain is 


wlrm wl 
Cle wl be 


(a) If we multiply the initial probability vector by this matrix we get 


P (55+55 o5 +55) (5 >) 
UD — =—— == == oe = Se: ce . 
2a 23703. 23 2° 2 


(b) The two-step transition matrix is P?, namely 


i) 

e 

e 

i) 

i) 

i) 

e 

HH 
Ole ola 
OLoo!] ss» 


2. (a) 0.4, the lower right corner of the matrix. 
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(b) (0.7)(0.7) = 0.49. 
(c) The probability that it will be cloudy on the next three days is  (0.4)(0.4) 
(0.4) = 0.064. The desired probability is 1 — 0.064 = 0.936. 


3. Saturday is three days after Wednesday, so we first compute 


Pe 0.667 0.333 
~ | 0.666 0.334 |” 


Therefore, the answers are (a) 0.667 and (b) 0.666. 
4. (a) From Exercise 3, the probability that it will be sunny on Saturday is 0.667. Therefore, the answer 
is (0.667)(0.7) = 0.4669. 
(b) From Exercise 3, the probability that it will be sunny on Saturday is 0.666. Therefore, the answer 
is (0.666)(0.7) = 0.4662. 
5, Lat = (02, 0.8). 
(a) The answer will be the second component of the vector vP. We easily compute vP = (0.62, 0.38), 
so the probability is 0.38. 


(b) The answer will be the second component of vP?. We can compute vP? by multiplying vP by 
P to get (0.662, 0.338), so the probability is 0.338. 


(c) The answer will be the second component of of vP®. Since vP? = (0.6662, 0.3338), the answer is 
0.3338. 


6. In this exercise (and the next two) the transition matrix P is 


Late On time 


Late 


On time 


(a) (0.8)(0.5)(0.5) = 0.2 
(b) (0.5)(0.2)(0.2) = 0.02. 


7. Using the matrix in Exercise 6, it is found that 


Pe 0.368 0.632 
~ | 0.395 0.605 | ° 


Therefore, the answers are (a) 0.632 and (b) 0.605. 
8. Let v = (0.7, 0.3). 


(a) The answer will be the first component of the vector vP. We can easily compute vP = (0.29, 0.71), 
so the answer is 0.29. 


(b) The answer will be the second component of the vector vP?. We compute vP® = (0.3761, 0.6239), 
so the answer is 0.6239. 
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9. (a) It is found that 


Sleolr RF E|N 
Sl wolrnm o Bl 
Blea o S]z 


Rl A cole Oo Ble 


The answer is given by the element in the third row and second column. 


(b) The answer is the element in the first row and third column of P?, namely 0.125. 


11 1 
10. Let v= ( 2 ). 


e434 
(a) The probabilities for s1, 2,53, and s4 will be the four components of the vector vP. 
(b) The required probabilities will be the four components of vP?. 


(c) The required probabilities will be the four components of vP?. 


11. The transition matrix for the states A and B is 


wilh wlkR 
wlrR wl rp 


It is found that 


41 40 
4_ | 81 81 
a 40 41 
81 81 


40 41 
Therefore, the answers are (a) a1 and (b) ah 


12. (a) Using the transition probabilities stated in the exercise, we construct 
0.0 0.2 0.8 


P=! 06 0.0 0.4 
0.5 0.5 0.0 


(b) It is found that 


0.52 0.40 0.08 
P? =|! 0.20 0.32 0.48 
0.30 0.10 0.60 


111 
Let v = (5. 3 5): The probabilities that A,B, and C will have the ball are equal to the three 
components of vP?. Since the third component is largest, it is most likely that C' will have the 
ball at time n + 2. 
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13. 


14. 


15. 


16. 


17. 
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The states are triples of possible outcomes: (HHH), (HHT), (HTH), etc. There are a total of eight 
such triples. The conditional probabilities of the possible values of the outcomes on trials (n—1,n,n+1) 
given all trials up to time n depend only on the trials (n — 2,n — 1,n) and not on n itself, hence we 
have a Markov chain with stationary transition probabilities. Every row of the transition matrix has 
the following form except the two corresponding to (HHH) and (TTT). Let a,b,c stand for three 
arbitrary elements of {H,T}, not all equal. The row for (abc) has 0 in every column except for the two 
columns (abH) and (abT), which have 1/2 in each. In the (HHH) row, every column has 0 except the 
(HHT) column, which has 1. In the (TTT) row, every column has 0 except the (TT'H) column which 
has 1. 


Since we switch a pair of balls during each operation, there are always three balls in box A during this 
process. There are a total of nine red balls available, so there are four possible states of the proposed 
Markov chain, 0, 1, 2, 3, each state giving the number of red balls in box A. The possible compositions 
of box A after the nth operation clearly depend only on the composition after the n — 1st operation, so 
we have a Markov chain. Also, balls are drawn at random during each operation, so the probabilities of 
transition depend only on the current state. Hence, the transition probabilities are stationary. If there 
are currently 0 red balls in box A, then we shall certainly remove a green ball. The probability that we 
get a red ball from box B is 9/10, otherwise we stay in state 0. So, the first row of P is (1/10, 9/10, 0,0). 
If we start with 1 red ball, then we remove that ball with probability 1/3. We replace whatever we 
draw with a red ball with probability 8/10. So we can either go to state 0 (probability 1/3 x 2/10), 
stay in state 1 (probability 1/3 x 8/10 + 2/3 x 2/10), or go to state 2 (probability 2/3 x 8/10). The 
second row of P is (1/15, 2/5,8/15,0). If we start with 2 red balls, we remove one with probability 2/3 
and we replace it with red with probability 7/10. So, the third row of P is (0,1/5,17/30,7/30). If we 
start with 3 red balls, we certainly remove one and we replace it by red with probability 6/10, so the 
fourth row of P is (0,0, 2/5, 3/5). 


We are asked to verify the numbers in the second and fifth rows of the matrix in Example 3.10.6. For 
the second row, the parents have genotypes AA and Aa, so that the only possible offspring are AA and 
Aa. Each of these occurs with probability 1/2 because they are determined by which allele comes from 
the Aa parent. Since the two offspring in the second generation are independent, we will get {AA, AA} 
with probability (1/2)? = 1/4 and we will get {Aa, Aa} with probability 1/4 also. The remaining 
probability, 1/2, is the probability of {AA, Aa}. For the fifth row, the parent have genotypes Aa and 
aa. The only possible offspring are Aa and aa. Indeed, the situation is identical to the second row with 
a and A switched. The resulting probabilities are also the same after this same switch. 


We have to multiply the initial probability vector into the transition matrix and do the arithmetic. For 
the first coordinate, we obtain 


1 1 1 9 
227 2 SOG ees =, 
rT iia Glace’ Ge 64 


The other five elements are calculated in a similar fashion. The resulting vector is 


( 9 3 1 5 38 9 ) 
64’ 16’ 32’ 16’ 16’ 64/ © 
(a) We are asked to find the conditional distribution of X, given X,-; = {Aa,aa} and Xpj41 = 
{AA, aa}. For each possible state x,, we can find 
Prt, =te) Xa = 4 Ae, a0), Xa = {AA aes) (S.3.4) 
Pr Xn = tas Xi = (AA.oo} | Ana = {Aa aa} 
Pr(Xn41 _ {AA, aa}|Xn—1 = { Aa, aa}) 
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The denominator is 0.0313 from the 2-step transition matrix in Example 3.10.9. The numerator 
is the product of two terms from the 1-step transition matrix: one from {Aa,aa} to x, and the 
other from x, to {AA,aa}. These products are as follows: 
In 
{AA,AA} {AA,Aa} {AA,aa} {Aa,Aa} {Aa,aa} {aa,aa} 
0 0 0 0.25 x 0.125 0 0 
Plugging these into (8.3.4) gives 
Pr(X, = {Aa, Aa}|Xn41 = {Aa, aa}, Xn4i1 = {AA, aa}) = 1, 
and all other states have probability 0. 
(b) This time, we want 
Pr(Xy = talAna = {Aa,aa), Xeni = (ae, aa}) 
Pr X, = tip Ana = 100, 00}|X,,-1 = { Aa,ae}) 
Pr( X41 = {aa,aa}|X,—-1 = {Aa,aa}) 


The denominator is 0.3906. The numerator products and their ratios to the denominator are: 


ti {AA, AA} {AA,Aa} {AA,aa} {Aa, Aa} {Aa,aa}  {aa, aa} 
Numerator 0 0 0 0.25 x 0.0625 0.50.25 0.25 x 1 
Ratio 0 0 0 0.0400 0.3200 0.6400 


This time, we get 


0.04 if x, = {Aa, Aa}, 
Pr X= | 4nt = (AG, oe} An = (Aa Aah) =< 0.22 wae, ={Aaoa}, 4 
0.64 i 4, ={a0,00}, 


and all others are 0. 


18. We can see from the 2-step transition matrix that it is possible to get from every non-absorbing state 
into each of the absorbing states in two steps. So, no matter what non-absorbing state we start in, 
the probability is one that we will eventually end up in one of absorbing states. Hence, no distribution 
with positive probability on any non-absorbing state can be a stationary distribution. 


19. The matrix G and its inverse are 
—0.3 1 
= = ( 0.6 1 iF 
10 1 —1 
—I _ _ 
oT = 9 er a: 


The bottom row of Gu! is (2/3,1/3), the unique stationary distribution. 


20. The argument is essentially the same as in Exercise 18. All probability in non-absorbing states eventu- 
ally moves into the absorbing states after sufficiently many transitions. 
3.11 Supplementary Exercises 
Solution to Exercises 
1. We can calculate the c.d.f. of Z directly. 
F(a) = Pro <2)=]PrHZ2]=X) Prix Se) thre ]Y Py = 3) 
- 5 P(X <z)+ 5 Pu <2) 
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The graph is in Fig. $.3.35. 


L L L L ! > 
Z 


Figure $.3.35: Graph of c.d.f. for Exercise 1 of Sec. 3.11. 


2. Let x1,...,x2% be the finitely many values for which f;(x) > 0. Since X and Y are independent, the 
conditional distribution of Z = X + Y given X = « is the same as the distribution of « + Y, which 
has the p.d.f. fo(z — x), and the c.d.f. F(z — x). By the law of total probability the c.d.f. of Z is 
yok, Fo(z — 2) fi(x;). Notice that this is a weighted average of continuous functions of z, F(z — 2;) 
fori =1,...,k, hence it is a continuous function. The p.d.f. of Z can easily be found by differentiating 
the c.d.f. to obtain *_, fo(z — 24) fi(ai). 


3. Since F(x) is continuous and differentiable everywhere except at the points x = 0, 1, and 2, 


F(x) { 


l 1 
0 1 2 


wal 


Figure $.3.36: Graph of c.d.f. for Exercise 3 of Sec. 3.11. 


2 
5 forO<a <1, 
dF (x) 
3 
f(x) = ae 5 for 1 <2 <2, 
O otherwise. 


4. Since f(x) is symmetric with respect to c = 0, F(0) = Pr(X <0) =0.5. Hence, 


[- J(Qjde= sf exp(—2) dx = .4. 


It follows that exp(—x9) = .2 and zo = log 5. 


5. X, and X92 have the uniform distribution over the square, which has area 1. The area of the quarter 
circle in Fig. $.3.37, which is the required probability, is 7/4. 
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Xo 4 


/ 
Ty, 


0 1 X4 


Figure $.3.37: Region for Exercise 5 of Sec. 3.11. 


(a) Pr(X divisible by n) = f(n) + f(2n) + f(8n)+--- = s ay = = 
<7 Cp) (nz) n 


(b) By part (a), Pr(X even) =1/2?. Therefore, Pr(X odd) = 1-—1/2?. 


Pr(X + X2 even) = Pr(Xj even) Pr(X2 even) + Pr(X1 odd) Pr(X2 odd) 
_ fifi a a 
. (=) (a) + ( oh 7 
1 


1 


Let G(x) devote the c.d.f. of the time until the system fails, let A denote the event that component 1 
is still operating at time x, and let B denote the event that at least one of the other three components 
is still operating at time 7. Then 


1 — G(x) = Pr(System still operating at time 2) = Pr(AM B) = Pr(A) Pr(B) = [1 — F(2)|[1 — F3(2)). 


Hence, G(r) = F(x) (1+ F?(«) — F3(a)). 


. Let A denote the event that the tack will land with its point up on all three tosses. Then Pr(A|X = 


x) = x°. Hence, 


Let Y denote the area of the circle. Then Y = 7X”, so the inverse transformation is 


dx 1 
= Y2 and — =——~. 
x = (y/T) an dy 2(ry)'?2 


Also, if 0 <a < 2, then 0 < y < 47. Thus, 


_ 1 fay alg 4 


and g(y) = 0 otherwise. 
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13. 


14. 


15. 


16. 
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F(x) = 1-exp(—2z) for x > 0. Therefore, by the probability integral transformation, F(X) will have 
the uniform distribution on the interval [0,1]. Therefore, 


Y = 5F(X) = 5(1 — exp(—2X)) 


will have the uniform distribution on the interval [0,5]. 


It might be noted that if Z has the uniform distribution on the interval [0,1], then 1 — Z has the same 
uniform distribution. Therefore, 


Y = 5[1 — F(X)] = 5exp(—2X) 
will also have the uniform distribution on the interval [0, 5]. 


This exercise, in different words, is exactly the same as Exercise 7 of Sec. 1.7 and, therefore, the solution 
is the same. 


Only in (c) and (d) is the joint p.d.f. of X and Y positive over a rectangle with sides parallel to the 
axes, so only in (c) and (d) is there the possibility of X and Y being independent. Since the uniform 
density is constant, it can be regarded as being factored in the form of Eq. (3.5.7). Hence, X and Y 
are independent in (c) and (d). 


The required probability p is the probability of the shaded area in Fig. $.3.38. Therefore, 


COTY 
LX DKSXOAKS 
ASSSSSSSIONS 


SSS 
OOS ISIS 
SSSI 
seeasrantenteasteatsarettss 


SS 


x 
Figure $.3.38: Figure for Exercise 14 of Sec. 3.11. 


/ 
p=i-p(4y=1— ff” fle) flwdedy = 1-1/8 = 2/3, 


This problem is similar to Exercise 11 of Sec. 3.5, but now we have Fig. $.3.39. The area of the shaded 


1337.5 
region is now 550 + 787.5 = 1337.5. Hence, the required probability is 3600 ~ 3715 


For 0 <2 <1, 


1 
fi(z) =ti (a +y)dy = 14 2x — 32. 


Theref pe(x<3)=f f@ar= ee 
erefore, Pr 3) = Jy (ede = fae 
Finally, for 0 < z,y <1, 


_ fy) _ _ 2e+y) 
ga(y |x) = AiG) teon— ae 
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Ba 


60 


> 
10 60 A 


Figure $.3.39: Figure for Exercise 15 of Sec. 3.11. 


9y? 
17. f(x,y) = f(x) g(y| 2) aoa hoe Py <1. 


Hence, 


1 
f(x,y) dx = —9y log(y) for 0<y<1 


oa 
— 
ce 
~~ 
l| 
os 


and 


for Oye =< 1. 


_f@y_ 1 
IY) "Fy. elon) 


18. X and Y have the uniform distribution over the region shown in Fig. $.3.40. The area of this region is 


cali | 


Figure $.3.40: Region for Exercise 18 of Sec. 3.11. 


4. The area in the second plus the fourth quadrants is 1. Therefore, the area in the first plus the third 
quadrants is 3, and 


3 
Pr(XY¥ > 0) = 5. 
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Furthermore, for any value of x (—1 < x < 1), the conditional distribution of Y given that X = x will 
be a uniform distribution over the interval [x — 1,2 + 1]. Hence, 


1 

— for~-l<y<e@x+4+l, 
gly|xz)= 4 2 

0 otherwise. 


19. 


i: pal 
File) [ f Sazdy=3- 60 +32? = 30-2) for 0 < 2 <1, 
a Jy 
y pl 
foly) = [6 dzde =6y0 ~y) forO<y<1. 
O Jy 


f3(z) 


z ry 
| [6 dvdy=32 for O0<z< 1. 
0 JO 


20. Since f(x,y, z) can be factored in the form g(x, y)h(z) it follows that Z is independent of the random 
variables X and Y. Hence, the required conditional probability is the same as the unconditional 
probability Pr(3X > Y). Furthermore, it follows from the factorization just given that the marginal 
p.d.f. h(z) is constant for 0 < z < 1. Thus, this constant must be 1 and the marginal joint p.d.f. of X 
and Y must be simply g(x,y) = 2, for0 <x <y< 1. Therefore, 


1 py 9 
pr(3x >¥)= [ Qdedy = 2. 
0 Jy/3 3 
The range of integration is illustrated in Fig. $.3.41. 


Ya 
1 


> 
x 


0 1 


Figure $.3.41: Range of integration for Exercise 20 of Sec. 3.11. 


exp(—(x + for x > 0, y > 0, 
21. (a) F(z,y) -| 0 ee a 


Also, x = uv and y = (1 —u)v, so 


J= | a =v>0 
—v l-wu 
Therefore, 
vexp(—v) forO<u<l,v>0, 
g(u,v) = f(wv,[1 — ue) [J] = mma 
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(b) Because g can be appropriately factored (the factor involving u is constant) and it is positive over 
an appropriate rectangle, it follows that U and V are independent. 


22. Here, x = uv and y = v, so 


v U 
j= 01 =v> 0. 


Therefore, 


8uv> forO<u<1,0<v<1, 


(u,v) = fue, 0) 7 = : 


otherwise. 


X and Y are not independent because f(x,y) > 0 over a triangle, not a rectangle. However, it can be 
seen that U and V are independent. 

_ AF (2) 
dz 


23. Here, f(x) = exp(—z) for x > 0. It follows from the results in Sec. 3.9 that 


(Yi, Yn) = (nr — 1)(exp(—y1) — exp(—Yn))"-* exp(—(41 + Yn)) 


for 0 < yy < Yn. Also, the marginal p.d.f. of Y;, is 


9n(Yn) = n(1 — exp(—Yn))"~! exp(—yn) for yn > 0. 
Hence, 


(n — 1)(exp(—y1) — exp(—yn))"~? exp(—y1) 
(1 ~exp(—yn))?-* 


24. As in Example 3.9.7, let W = Y, — Y; and Z = Y,. The joint p.d-f. g(w, z) of (W, Z) is, forO<w<1 
and0Q<z<l-w, 


h(y1| Yn) = for 0< y1< Yn. 


g(w, z) = 24[(w +z)? — 27] z (wt z) = 24 w (229 + 38wz? + w?z), 


and 0 otherwise. Hence, the p.d.f. of the range is, for 0 << w <1, 
1l—w 
h(w) = - g(w, z) dz = 12w(1 — w)?. 
0 


25. (a) Let fo be the marginal p.d.f. of Y. We approximate 


ye 
Pry-e<¥<y+9=/ fo(t)dt > 2€ fa(y). 
YE 
(b) For each s, we approximate 


yte 
f(s, t)dt = 2ef(s,y). 


YE 
Using this, we can approximate 


x 
Pr(X <a,y-e<¥syte= | 


=O 


y+e x 
/ f(s, t)dtds = 2e i f(s, y)ds. 
y—eE —oo 
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(c) Taking the ratio of the approximation in part (b) to the approximation in part (a), we obtain 
Pr(X <a,y-e€<Y <y+te) 

Priy—e<Y <y+e) 
Jno f(s, y)ds 

fo(y) 


i gi(sly)ds. 


26. (a) Let Y = Xj. The transformation is Y = X, and Z = X, — Xo. The inverse is x1 = y and 
x2 =y—z. The Jacobian has absolute value 1. The joint p.d.f. of (Y, Z) is 


Pr(X <aly—e<Y<y+t+e) = 


2 


g{y, 2) = exp(—y — (y — 2)) = exp(—2y + 2), 
for y > 0 and z < y. 
(b) The marginal p.d.f. of Z is 


1 1} exp(-z) ifz>0 

=) dy=- a) 0, see. oe ; 29 
Deane Saye ay 2 epee 7) 2 exp(z) if z <0. 

The conditional p.d.f. of Y = X, given Z = 0 is the ratio of these two with z = 0, namely 
gi(z1|0) = 2exp(—2z2,), for x; > 0. 


(c) Let Y = Xj. The transformation is now Y = X; and W = X1/X»2. The inverse is 7; = y and 
x2 =y/w. The Jacobian is 


_ 1 0 _  Y 
1 = act va ye? =o 
The joint p.d.f. of (Y,W) is 
g(y,w) = exp(—y — y/w)y/w* = yexp(—y(1 + 1/w))/w?, 
for y,w > 0. 
(d) The marginal p.d.f. of W is 
[ yex(-u + 1/w))/wrd 
exp| — Ww Ww SSE _"™"—_ ___ 
oon Y= Ww +i/w? C+)?’ 
for w > 0. The conditional p.d.f. of Y = X1 given W = 1 is the ratio of these two with w = 1, 
namely 
hi(#1|1) = 421 exp(—221), for 1 > 0. 

(e) The conditional p.d.f. g; in part (b) is supposed to be the conditional p.d.f. of X, given that Z 
is close to 0, that is, that |X, — X | is small. The conditional p.d.f. hy in part (d) is supposed 
to be the conditional p.d.f. of X1 given that W is close to 1, that is, that |X /X2 —1| is small. 
The sets of (21,22) values such that |x, — x| is small and that |x /a2 — 1| is small are drawn 
in Fig. $.3.42. One can see how the two sets, although close, are different enough to account for 
different conditional distributions. 


27. The transition matrix is as follows: 


Players in game n+ 1 


(A,B) (A,C) (B,C) 


Players in 
game n (A,C) 0.6 0 0.4 
( 
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Boundary of |x1-x2|<0.1 F 
Boundary of |x1/x2-1|<.1 E 


Figure $.3.42: Boundaries of the two regions where |x — r2| < 0.1 and |a21/x2 —1| < 0.1 in Exercise 26e of 
Sec. 3.11. 


28. If A and B play in the first game, then there are the following two sequences of outcomes which result 
in their playing in the fourth game: 
i) A beats B in the first game, C beats A in the second game, B beats C in the third game; 
ii) B beats A in the first game, C beats B in the second game, A beats C in the third game. 


The probability of the first sequence is (0.3) (0.4) (0.8) = 0.096. The probability of the second sequence 
is (0.7) (0.2) (0.6) = 0.084. Therefore, the overall probability that A and B will play again in the fourth 
game is 0.18. The same sort of calcuations show that this answer will be the same if A and C play in 
the first game or if B and C play in the first game. 


29. The matrix G and its inverse are 


AO 03 1a 
G= 0.6 —1.0 1.0 |, 
0.8 0.2 1.0 
—0.5505 —0.4587 0.5963 
Gi = 0.0917 —0.8257 0.7339 


0.4220 0.2018 0.3761 


The bottom row of Gu! is the unique stationary distribution, (0.4220, 0.2018, 0.3761). 
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Chapter 4 


Expectation 


4.1 The Expectation of a Random Variable 


Commentary 


It is useful to stress the fact that the expectation of a random variable depends only on the distribution 
of the random variable. Every two random variables with the same distribution will have the same mean. 
This also applies to variance (Sec. 4.3), other moments and m.g.f. (Sec. 4.4), and median (Sec. 4.5). For this 
reason, one often refers to means, variance, quantiles, etc. of a distribution rather than of a random variable. 
One need not even have a random variable in mind in order to calculate the mean of a distribution. 


Solutions to Exercises 


1. The mean of X is 


b ba? a+b 
B(x) = f af (a)de = [> oae = = 7. 


1 1 (100)(101) 
9 BOO = 042 464 100) = = 805. 
=a eo 


3. The total number of students is 50. Therefore, 


20 22 4 3 1 
E(X) = 18( — 19; — 20{ — 21| — 25( — ) = 18.92. 
(*) (5) a (5) ™ (=) a (55) * (=) 


4. There are eight words in the sentence and they are each equally probable. Therefore, the possible values 
of X and their probabilities are as follows: 


az | f(z) 
2 
3 
4 | 1/8 
9 | 1/8 


1 5 1 1 
It foll that F(X) =2( = 3([- 4{— 9{ —) = 3.75. 
ollows that E(X) (=) + (>) + (=) + (=) 
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5. There are 30 letters and they are each equally probable: 
2 letters appear in the only two-letter word; 
15 letters appear in three-letter words; 
4 letters appear in the only four-letter word; 
9 letters appear in the only nine-letter word. 


Therefore, the possible values of Y and their probabilities are as follows: 


nv =2(8)-9(8) (3 


1 
x 
vi ‘ ; ; : 1 
7. E(s)= | — dx = — lim log(x) = oo. Since the integral is not finite, F (=) does not exist. 
xX O 22 x0 xX 


1 pz 1 
8. (XY) = | ; zy - 12y*dydx = ~. 
0 JO 2 


9. If X denotes the point at which the stick is broken, then X has the uniform distribution on the interval 
[0,1]. If Y denotes the length of the longer piece, then Y = max{X,1-— X}. Therefore, 


1/2 1 


‘ 3 
EY) = [ max(z,1—2)dr = i (1 — x)dx + ' edx = m7 


10. Since a has the uniform distribution on the interval [—7/2, 7/2], the p.d.f. of a is 


il T T 
— for-~<a<-, 
0 otherwise. 


ae 


0 1 


ok | 


Figure $.4.1: Figure for Exercise 10 of Sec. 4.1. 
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Also, Y = tan(a). Therefore, the inverse transformation is a = tan! Y and da/dy = 1/(1+ 7). As 
a varies over the interval (—7/2, 7/2), Y varies over the entire real line. Therefore, for —co < y < ow, 
the p.d.f. of Y is 


1 1 


gy) = F(tan™y) ee 


The p.d.f.’s of Y; and Y,, were found in Sec. 3.9. For the given uniform distribution, the p.d.f. of Y1 is 


_fnQl-y)1! for 0<y<1, 
nly) = ‘ otherwise. 


Therefore, 


1 
n+l 


2%) = [n= "tay = 


The p.d.f. of Y, is 


ny”! for 0<y<1, 
In(Y) i. : 


otherwise. 


Therefore, 


n 
n+l 


1 
BY.) = f y- ny” ‘dy = 


It follows from the probability integral transformation that the joint distribution of F'(X1),..., (Xn) 
is the same as the joint distribution of a random sample from the uniform distribution on the interval 
[0,1]. Since FY) is the smallest of these values and F'(Y,,) is the largest, the distributions of these 
variables will be the same as the distributions of the smallest and the largest values in a random sample 
from the uniform distribution on the interval [0,1]. Therefore, E|F(Y;)] and E[F(Y,,)] will be equal to 
the values found in Exercise 11. 


Let p = Pr(X = 300). Then E(X) = 300p + 100(1 — p) = 200p + 100. For risk-neutrality, we need 
E(X) = 110 x (1.058) = 116.38. Setting 200p + 100 = 116.38 yields p = 0.0819. The option has a value 
of 150 if X = 300 and it has a value of 0 if X = 100, so the mean of the option value is 150p = 12.285. 
The present value of this amount is 12.285/1.058 = 11.61, the risk-neutral price of the option. 


For convenience, we shall not use dollar signs in these calculations. 


(a) We need to check the investor’s net worth at the end of the year in four situations: 
i. X = 180 and she makes the transactions 
ii. X = 180 and she doesn’t make the transactions 
iii. X = 260 and she makes the transactions 


iv. X = 260 and she doesn’t make the transactions 
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Since we don’t know the investor’s entire net worth, we shall only calculate it relative to all other 
investments. This means that we only need to pretend as if the investor had one share of the 
stock worth 200. We don’t care what else she has. We need to show that cases (i) and (ii) lead 
to the same net worth and that cases (iii) and (iv) lead to the same net worth. In case (ii), her 
net worth will change by —20. In case (iv), her net worth will change by 60. In case (i), nobody 
will exercise the options. So she will sell the three extra shares for 180 each (total 540) and pay 
the loan of 519.24 plus interest 20.77 for a net 0.01 loss. Plus her one original share of stock has 
lost 20 and her net worth has changed a total of —20.01, which is the same as case (i) except for 
the accumulated rounding error. In case (iii), the options will be exercised, and she will receive 
800 for four shares of the stock. She will have to pay back the loan of 519.24 plus 20.77 in interest 
for a net gain of 259.99. But she no longer has the one share of stock that was worth 200, so her 
change in net worth is 59.99, the same as case (iv) to within the same one cent of rounding. 


If the option price is x < 20.19, then the investor only receives 4a for selling the options, but still 
needs to pay 600 for the three shares, so she must borrow 600 — 4x. The rest of the calculations 
proceed just as in part (a) but we must replace 519.24 by 600 — 42, and the interest 20.77 must be 
replaced by 0.04(600 — 4x). That is, to pay back the loan with interest, she must pay 624 — 4.16x 
instead of 540.01. So she pays an additional 83.99 — 4.16z relative to the situation in case (a) 
regardless of what happens to the stock price. 

This situation is the same as part (b) except now the value of 83.99 — 4.16z is negative instead 
of positive, so the investor pays back less and hence makes additional profit rather than suffers 
additional loss. 


15. The value of the option is 0 if X = 260 and it is 40 if X = 180, so the expected value of the option is 


40(1 — p) = 40 x 0.65 = 26. The present value of this amount is 26/1.04 = 25. 


16. If f is the pf. of X, and Y = |X|, then for y > 0, Pr(Y = y) = Pr(X = y) + Pr(X = —-y). In 


Example 4.1.4, Pr(X = y) = Pr(X = —y) = 1/[2y(y + 1)|, and this makes Pr(Y = y) the pf. in 
Example 4.1.5. 


4.2 Properties of Expectations 


Commentary 


Be sure to stress the fact that Theorem 4.2.6 on the expected value of a product of random variables has the 
condition that the random variables are independent. This section ends with a derivation for the expectation 
of a nonnegative discrete random variable. Although this method has theoretical interest, it is not central to 
the rest of the text. 


Solutions to Exercises 


1. The random variable Y is equal to 10(R — 1.5) in dollars. The mean of Y is 10[E(R) — 1.5]. From 


Exercise 1 in Sec. 4.1, we know that E(R) = (—3+4+ 7)/2 =2, so E(Y) =5. 


D. BOX — 8X5 4X5 —4) =28(,) — 88S) + Bs) — 4 25) — 35) 4 dS 4, 


3. 


El (y= 2X5 4X3)"|) = B(Xe +4? 4 XS AX Xo 4 OX Xa = 4X Xe) 
= E(X{) + 4E(X3) + E(X$) — 4E(X1X2) 
+ 2E(X,X3) — 4E(X2X3). 
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Since Xj, Xo, and X3 are independent, 
E(XXj) = E(Xi)E(X;j) for i Aj. 
Therefore, the above expectation can be written in the form: 
E(X7) + 4E(X3) + E(X3) — 4E(X1)E(X2) + 2E(X1)E(X3) — 4E(X2) E(X3). 


Also, since each X; has the uniform distribution on the interval [0, 1], then E(X;) = 5 and 
2 1 

E(X?2) -| d= —, 

0 3 


Hence, the desired expectation has the value 1/2. 


. The area of the rectangle is XY. Since X and Y are independent, E(XY) = E(X)E(Y). Also, 
E(X) =1/2 and E(Y) =7. Therefore, (XY) = 7/2. 


. Fori=1,...,n, let Y; = 1 if the observation X; falls within the interval (a, b), and let Y; = 0 otherwise. 


b 
Then AY) =P, =1)= | f(x)dx. The total number of observations that fall within the interval 
(a, b) is ¥;+--»+Y¥q, and 


BY, +--+ ¥) = BM) +--+ BO) = fo flelde: 


. Let X; = 1 if the 7th jump of the particle is one unit to the right and let X; = —1 if the 7th jump is 
one unit to the left. Then, for i=1,...,n, 


E(X;) = (-1)p + (I). — p) = 1 — 2p. 
The position of the particle after n jumps is X; + --- + X,, and 
B(Xy +++ + Xp) = E(X1) +--+ + E(Xp) = n(1 — 2p). 


. For i=1,...,n, let X; = 2 if the gambler’s fortune is doubled on the ith play of the game and let 
X; = 1/2 if his fortune is cut in half on the ith play. Then 


nsy=a(8)+(0)@)=4 


After the first play of the game, the gambler’s fortune will be cX,, after the second play it will be 
(cX,)X2, and by continuing in this way it is seen that after n plays the gambler’s fortune will be 
cX 1X9...Xn. Since X1,..., Xn are independent, 


E(cX; ...Xp) = cB(X;)... F(X) = (2) - 


. It follows from Example 4.2.4 that 


24 
Since Y = 8— X, E(Y) =8- E(X) = =" Finally, E(X —Y) = E(X)- E(Y) = —F 
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9. We know that E(X) = np. Since Y =n—X, E(X —Y) = E(2X —n) = 2E(X) —n=n(2p- 1). 


10. (a) Since the probability of success on any trial is p = 1/2, it follows from the material presented at 
the end of this section that the expected number of tosses is 1/p = 2. 


(b) The number of tails that will be obtained is equal to the total number of tosses minus one (the 
final head). Therefore, the expected number of tails is 2—1 = 1. 


11. We shall use the notation presented in the hint for this exercise. It follows from part (a) of Exercise 10 
that E(X;) = 2 fori=1,...,k. Therefore, 


E(X) = E(X1) +--+» + E(X,) = 2k. 


12. (a) We need the p.d.f. of X = 54R,; + 110R2 where R, has the uniform distribution on the interval 
[—10,20] and Ry has the uniform distribution on the interval [—4.5,10]. We can rewrite X as 
X, + X2q where X; = 54R; has the uniform distribution on the interval [—540, 1080] and X2 = 
110R, has the uniform distribution on the interval [—495,1100]. Let f; be the p.d.f. of X; for 
i = 1,2, and use the same technique as in Example 3.9.5. First, compute 


AA kG=s = 3.87 x 10-" for —540 < z < 1080 and —495 < x — z < 1100, 
; ° — 10 otherwise. 


We need to integrate this over z for each fixed x. The set of x for which the function above is ever 
positive is the interval |[—1035, 2180]. For —1035 < x < 560, we must integrate z from —540 to 
x +495. For 560 < x < 585, we must integrate z from x — 1100 to x + 495. For 585 < x < 2180, 
we must integrate z from x — 1100 to 1080. The resulting integral is 


3.87 x 10-7 + 4.01 x 1074 for —1035 < x < 560, 
giz) =< 617 x lo for 560 < x < 585, 
8.44 x 1074 — 3.87 x 107-’x2 for 585 < x < 2180. 


2 


We need the negative of the 0.03 quantile. For —1035 < x < 560, the c.d.f. of X is 


__ 3.87 x 107" (a? — 1035?) 
7 2 

This function is a second degree polynomial in xz. To be sure that the 0.03 quantile is between 
—1035 and 560, we compute F'(—1035) = 0 and F'(560) = 0.493, which assures us that the 0.03 


quantile is in this range. Setting F(x) = 0.03 and solving for x using the quadratic formula yields 
x = —642.4, so VaR is 642.4. 


F(a) + 4.01 x 10~4(z + 1035). 


13. Use Taylor’s theorem with remainder to write 


(A= aye ” 


W(X) = gu) + (X — p)g'(u) + —3——9"(¥), (S.4.1) 


where yp = E(X) and Y is between X and yp. Take the mean of both sides of (S.4.1). We get 


_ 2 
Bla X)] = g(x) +0-+ 8 (v0) , 


The random variable whose mean is on the far right is nonnegative, hence the mean is nonnegative and 
Elg(X)] 2 g(u). 
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4.3. Variance 


Commentary 


113 


Be sure to stress the fact that Theorem 4.3.5 on the variance of a sum of random variables has the condition 
that the random variables are independent. 


Solutions to Exercises 


if 


fe 


We found in Exercise 1 of Sec. 4.1 that E(X) = (0+ 1)/2 = 1/2. We can find 
7 1 
E(X?) =i ade = =. 
0 3 


So Var(X) = 1/3 — (1/2)? = 1/12. 
there we also find that 
1 5 1 1 73 
nvy=a(t)-#(9) #02) 9-2 
oe 8 . 8 si 8 8 4 


(ae aes 
Therefore, Var(X) = E(X?) — [E(X)]? = ao (>) =e 


. The p.d.f. of this distribution is 


1 
fora<a<b, 

f(e)=4 boa 

0 otherwise. 

b 

Therefore, E(X) = = and 

to b-a 1 
B(x?) = | dx = = ~(b + ab+ a’). 
cae hag 3(b — a) 3 we] 


It follows that Var(X) = E(X?) — [E(X)? = 4(b- a)’. 


_ E[X(X — 1)] = E(X? — X) = E(X?) — pw = Var(X) + [E(X))? — w= 0? + pw? = 
. El(X —c)?] = E(X?) —2cE(X) +? = Var(X)+[E(X)?? —2eu+2 = 07? +p? -2cp+e? = 07 + (u 
. Since E(X) = E(Y), E(X —Y) =0. Therefore, 


E|(X — Y)?] = Var(X — Y) = Var[X + (-Y)]. 
Since X and —Y are independent, it follows that 
E|(X — Y)?] = Var(X) + Var(—Y) = Var(X) + Var(Y). 


(a) Since X and Y are independent, Var(X — Y) = Var(X) + Var(Y) =34+3=6. 
(b) Var(2X — 3Y +1) = 2? Var(X) + 3? Var(Y) = 4(3) + 9(3) = 39. 
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8. Consider a p.d.f. of the form 


but 


1 
é—= tora, 
f@j=4 2 


0 for x <1. 


2 are : : 
E(X*) = | c—dx is not finite. 
i > es 


Therefore, E(X) is finite but E(X?) is not. Therefore, Var(X) is not finite. 


9. The mean of X is (n+ 1)/2, and the mean of X? is Sok? /n = (n+1)(2n + 1)/6. So, 


k=1 


nr mr mr 2 n2 — 
var = Ota) ott wat 


10. The example efficient portfolio has sj = 524.7, s2 = 609.7, and s3 = 39250. 


(a) 


We know that R, has a mean of 6 and a variance of 55, while Ry has a mean of 4 and a variance of 
28. Since we are assuming that R; has the uniform distribution on the interval [a;,b;] for i = 1,2, 
we know that 


a; + b; 
E(Ri) = > 

bay)" 
Var(R;) = a 


(See Exercise 1 of this section for the variance of a uniform distribution.) For i = 1, we set 
(a; + b,)/2 =6 and (b; — a;)?/12 = 55. The solution is a; = —6.845 and bj = 18.845. For i = 2, 
we set (a2 + b2)/2 = 4 and (be — a2)?/12 = 28. The solution is ag = —5.165 and by = 13.165. 


Let X; = s,;R, and Xo = s9Ro. Then the distribution of X, is the uniform distribution on the 
interval [—3591.6, 9888.0], and X2 has the uniform distribution on the interval [—3149.1, 8026.7]. 
The value of the return on the portfolio is Y = X, + X2+ 1413. We need to find the 0.03 quantile 
of Y. As in Exercise 12 of Sec. 4.2, the p.d.f. of Y will be linear for the lowest values of y. Those 
values are —5327.7 < y < 5848.1. The line is g(y) = 6.638 x 10-°y + 3.537 x 107°. In this range, 


the c.d.f. is 
6.638 x 1079 
G(y) = ———_ 


Since G(—5327.7) = 0 and G(5848.1) = 0.4146, we know that the 0.03 quantile is in this range. 
Setting G(y) = 0.03, we find y = —2321.9. So VaR is 2321.9. 


(gf? = 5327.77) 43.537 & 10-? (y+ 5327.7). 


11. The quantile function of X can be found from Example 3.3.8 with a = 0 and b = 1. It is F~!(p) =p. 
So, the IQR is 0.75 — 0.25 = 0.5. 
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The c.d.f. is F(x) = 1 -—exp(—z), for x > 0 and F(x) = 0 for x < 0. The quantile function is 
F-'(p) = —log(1 — p). So, the 0.25 and 0.75 quantiles are respectively —log(0.75) = 0.2877 and 
— log(0.25) = 1.3863. The IQR is then 1.3863 — 0.2877 = 1.0986. 


From Table 3.1, we find the 0.25 and 0.75 quantiles of the distribution of X to be 1 and 2 respectively. 
This makes the IQR equal to 2—1= 1. 


The result will follow from the following general result: If x is the p quantile of X and a > 0, then az 
is the p quantile of Y = aX. To prove this, let F’ be the c.d.f. of X. Note that x is the greatest lower 
bound on the set C, = {z: F(z) > p}. Let G be the c.d-f. of Y, then G(z) = F(z/a) because Y < z if 
and only if aX < z if and only if X < z/a. The p quantile of Y is the greatest lower bound on the set 


Dp = ty: Gy) 2 vp} = ty: F(y/@) = p} = {az : F(z) = p} = aC, 


where the third equality follows from the fact that F(y/a) > p if and only if y = za where F(z) > p. 
The greatest lower bound on aC, is a times the greatest lower bound on C, because a > 0. 


4.4 Moments 


Commentary 


The moment generating function (m.g.f.) is a challenging topic that is introduced in this section. The m.g.f. 
is used later in the text to outline a proof of the central limit theorem (Sec. 6.3). It is also used in a few 
places to show that certain sums of random variables have particular distributions (e.g., Poisson, Bernoulli, 
exponential). If students are not going to study the proofs of these results, one could skip the material on 
moment generating functions. 


Solutions to Exercises 


1. 


Since the uniform p.d.f. is symmetric with respect to its mean pp = (a+b)/2, it follows that E[(X—)°?] = 
0. 


. The mean of X is (b+ a)/2, so the 2kth central moment of X is the mean of (X — [b + a]/2)?*. Note 


that Y = X — [b+ a]/2 has the uniform distribution on the interval [—(b — a)/2,(b — a)/2]. Also, 
Z = 2Y/(b—a) has the uniform distribution on the interval [—1,1]. So E(Y?*) = [(b — a) /2]?* E(Z?*). 


So, the 2kth central moment of X is [(b — a)/2]?*/(2k + 1). 


BOX =e] = 2 = 17] = CC = 3X" 43x = 1) S58 = 82) 430) —1 = 1, 


. Since Var(X) > 0 and Var(X) = E(X?) — [E(X)]?, it follows that E(X?) > [E(X)]*. The second part 


of the exercise follows from Theorem 4.3.3. 


. Let Y = (X — p)*. Then by Exercise 4, 


B(¥?) = B[(X — p)4] > [EYP = [Var(X)P = 0, 
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. Since 


for a<a<b, 


0 otherwise, 


then 


b 1 
t)= : 
v(t) = | exp(te)-—ae 
Therefore, for t 4 0, 


io= ee 
As always, W(0) = 


(3exp(t) + exp(—t)). Therefore, w = y’(0) = 1/2 and 


Cs 


ul(t) = 7Bexp(t) — exp(-t)) and "(®) = 


= w" (0) fee — () = -. 


. w(t) = (2t+3)exp(t? + 3t) and w(t) = (2t+3)%exp(t? + 3t) + 2exp(¢? + 3¢t). Therefore, p = y(0) =3 


and o% = "(0) — pw? = 11 — (3)? =2. 


- A(t) = ev (t) exp(e[y1(t) — 1]) and b2(t) = {lew (t))? + ev (t)} exp(c[vi(t) — 1]). We know that 


1(0) = 1,04(0) =m, and YY(0) = 0? +42. 
Therefore, E(Y) = ~5(0) = cu and 
Var(Y) = 49 (0) — [E(¥)]° = {(eu)? + (0? + n?)} — (cu)? = c(0? + 1°). 


The m.g.f. of Z is 


( 
= aie 
= exp(4t)E(exp(2txX))E(exp(—3tY )) (since X and Y are independent) 
= exp(4t)y(2t)w(—3t) 
= exp(4t) exp(4t? + 6t) exp(9t? — 9t) 
exp(13t? + t). 


If X can take only a finite number of values 7,...,2, with probabilities p;,...,p,, respectively, then 
the m.g.f. of X will be 


w(t) = pi exp(tx1) + po exp(ta2) +--+ + pp exp(tag). 


By matching this expression for y(t) with the expression given in the exercise, it can be seen that X 
can take only the three values 1, 4, and 8, and that f(1) = 1/5, f(4) = 2/5, and f(8) = 2/5. 
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We shall first rewrite ¢(t) as follows: 


4 1 1 
p(t) = — exp(0) + = exp(t) + = exp(—t). 
6 6 6 
By reasoning as in Exercise 11, it can now be seen that X can take only the three value 0, 1, and —1, 
and that f(0) = 4/6, f(1) = 1/6, and f(—1) = 1/6. 
The m.g.f. of a Cauchy random variable would be 
co exp(tzx) 
t)= ———~ dx. 8.4.2 
v(t) x n(1+ 2?) * ( ) 
If t > 0, Jim, exp(tx)/(1 + 27) = ov, so the integral in Eq. (S.4.2) is infinite. Similarly, if t < 0, 
lim exp(ta)/(1 + 27) = 00, so the integral is still infinite. Only for t = 0 is the integral finite, and 
r——0o 


that value is ¢(0) = 1 as it is for every random variable. 


The m.g.f. is 
© exp(tx 
w(t) = i Se ae 
1 x 


If t < 0, exp(tx) is bounded, so the integral is finite. If ¢ > 0, then Jim exp(tx)/x? = oo, and the 


integral is infinite. 


Let X have a discrete distribution with p.f. f(a). Assume that E(|X|") < oo for some a > 0. Let 
0<b<a. Then 


E(x) = yileli@e= >> eras > elro 
x jx|<1 |x|>1 
< 14+ SY) [a\*f(v) < 1+ E(|X|*) <0, 


|x|>1 


where the first inequality follows from the fact that 0 < |a|® < 1 for all |a| < 1 and |z2|° < |2| for all 
|x| > 1. The next-to-last inequality follows from the fact that the final sum is only part of the sum 
that makes up E(|X|°). 


Let Z =n—X. It is easy to see that Z has the same distribution as Y since, if X is the number of 
successes in n independent Bernoulli trials with probability of success p, then Z is the number of failures 
and the probability of failure is 1 — p. It is known from Theorem 4.3.5 that Var(Z) = Var(X), which 
also equals Var(Y). Also F(Z) =n—-— E(X), so Z- E(Z) =n-—-X—-—n+E(X) = E(X) — X. Hence 
the third central moment of Z is the negative of the third central moment of X and the skewnesses are 
negatives of each other. 


We already computed the mean pp = 1 and variance o? = 1 in Example 4.4.3. Using the m.g.f., the 
third moment is computed from the third derivative: 
6 
my 
tL) = 


The third moment is 6. The third central moment is 


E([X — 18) = E(X®) — 3E(X?) + 3E(X) -1=6-64+3-1=2. 


The skewness is then 2/1 = 2. 
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4.5 The Mean and the Median 


Solutions to Exercises 


1. 


6. 


The 1/2 quantile defined in Definition 3.3.2 applies to a continuous random variable whose c.d.f. is 
one-to-one. The 1/2 quantile is then 29 = F~1(1/2). That is, F(a) = 1/2. In order for a number m 
to be a median as define in this section, it must be that Pr(X < m) > 1/2 and Pr(X > m) > 1/2. 
If X has a continuous distribution, then Pr(X < m) = F(m) and Pr(X > m) = 1—- F(m). Since 
F (xo) = 1/2, m = zo is a median. 


. In this example, 7°_, f(x) = 21c. Therefore, c = 1/21. Since Pr(X < 5) = 15/21 and Pr(X > 5) = 


11/21, it follows that 5 is a median and it can be verified that it is the unique median. 


. A median m must satisfy the equation 


m 1 
| exp(—2z)dz = -. 
0 2 


Therefore, 1 — exp(—m) = 1/2. It follows that m = log 2 is the unique median of this distribution. 


. Let X denote the number of children per family. Then 


21+ 40 + 42 i 


Pr(X S$ 2) = Fa — 2 5 


and 


poe ee 

153 2 
Therefore, 2 is the unique median. Since all families with 4 or more children are in the upper half of 
the distribution no matter how many children they have (so long as it is at least 4), it doesn’t matter 
how they are distributed among the values 4, 5, .... Next, let Y = min{X,4}, that is Y is the number 
of children per family if we assume that all families with more than 4 children have exactly 4. We can 
compute the mean of Y as 


E(V) = == (0x 2141 40 +2 x 4243 2744 x 23) = 1.941. 


153 


. The p.d.f. of X will be h(x) = [f(x) + g(x)]/2 for —oo < x < oo. Therefore, 


= | alfle) + 9(o)lde = Fup + 19) 


Since [. h(X)dz = iz =f(x)dx = = 5 and [ Kade = [5 nae = * it follows that every value 


of m in the interval 1 < m < 2 will re a median. 


(a) The required value is the mean E(X), and 


7 2 
B(x) = [ x: dadx = 3 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


7: 


8. 


10. 


11. 


12. 


Section 4.5. The Mean and the Median 119 


(b) The required value is the median m, where 


Qe dx = =. 
7 nde = 5 


Therefore, m = 1/V2. 


(a) The required value is E(X), and 


B(x) = [02 (2+5) ar = <5. 


(b) The required value is the median m, where 


Mt 1 1 
~=|dr=-. 

/ («+ >) x 5 
Therefore, m = (V5 — 1)/2. 


E{(X — d)*] = E(X*) — 4E(X?)d + 6E(X?)d? — 4E(X)d? + d*. Since the distribution of X is symmet- 
ric with respect to 0, 


E(X) = E(X°) =0. 
Therefore, 
E|(X — d)*] = E(X*) + 6E(X?)d? + d*. 


For any given nonnegative values of F(X“) and E(X?), this is a polynomial of fourth degree in d and 
it is a minimum when d = 0. 


(a) The required point is the mean F(X), and 
E(X) = (0.2)(—3) + (0.1)(—1) + (0.1)(0) + (0.4)(1) + (0.2)(2) = 0.1. 


(b) The required point is the median m. Since Pr(X < 1) =0.8 and Pr(X > 1) = 0.6, the point 1 is 
the unique median. 


Let x1 <--- < 2x, denote the locations of the n houses and let d denote the location of the store. We 
n nm 
must choose d to minimize » |x; — d| or equivalently to minimize S- |x; — d|/n. This sum can be 


interpreted as the M.A.E. be tee a discrete distribution in which sack of the n points 71,...,2, has 
probability 1/n. Therefore, d should be chosen equal to a median of this distribution. If n is odd, then 
the middle value among 71,..., 2p is the unique median. If n is even, then any point between the two 
middle values among 21,...,2, (including the two middle values themselves) will be a median. 


The M.S.E. of any prediction is a minimum when the prediction is equal to the mean of the variable 
being predicted, and the minimum value of the M.S.E. is then the variance of the variable. It was 
shown in the derivation of Eq. (4.3.3) that the variance of the binomial distribution with parameters 
n and p is np(1—p). Therefore, the minimum M.S.E. that can be attained when predicting X is 
Var(X) = 7(1/4)(3/4) = 21/16 and the minimum M.S.E. that can be attained when predicting Y is 
Var(Y) = 5(1/2)(1/2) = 5/4 = 20/16. Thus, Y can be predicted with the smaller M.S.E. 


(a) The required value is the mean E(X). The random variable X will have the binomial distribution 
with parameters n = 15 and p=0.3. Therefore, F(X) = np = 4.5. 
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(b) The required value is the median of the binomial distribution with parameters n = 15 and p = 0.3. 
From the table of this distribution given in the back of the book, it is found that 4 is the unique 
median. 


To say that the distribution of X is symmetric around m, means that X and 2m — X have the same 
distribution. That is, Pr(X < x) = Pr(2m — X < 2) for all x. This can be rewritten as Pr(X < x) = 
Pr(X > 2m— 2x). With = m, we see that Pr(X < m) = Pr(X > m). If Pr(X < m) < 1/2, then 
Pr(X < m)+Pr(X > m) < 1, which is impossible. Hence Pr(X < m) > 1/2 and Pr(X > m) > 1/2, 
and m is a median. 


The Cauchy distribution is symmetric around 0, so 0 is a median by Exercise 13. Since the p.d.f. of 
the Cauchy distribution is strictly positive everywhere, the c.d.f. will be one-to-one and 0 is the unique 
median. 


(a) Since a is assumed to be a median, F(a) = Pr(X < a) > 1/2. Since b > a is assumed to be a 
median Pr(X > b) > 1/2. If Pr(X <a) > 1/2, then Pr(X < a)+Pr(X > b) > 1. But {X < a} 
and {X > b} are disjoint events, so the sum of their probabilities can’t be greater than 1. This 
means that F(a) > 1/2 is impossible, so F(a) = 1/2. 

(b) The c.d.f. F is nondecreasing, so A = {x : F(x) = 1/2} is an interval. Since F is continuous 
from the right, the lower endpoint c of the interval A must also be in A. For every x, Pr(X < 
x) +Pr(X >a) > 1. For every x € A, Pr(X < x) = 1/2, hence it must be that Pr(X > x) > 1/2 
and x is a median. Let d be the upper endpoint of the interval A. We need to show that d is 
also a median. Since F' is not necessarily continuous from the left, F(d) > 1/2 is possible. If 
F(d) = 1/2, then d € A and d is a median by the argument just given. If F(d) > 1/2, then 
Pr(X = d) = F(d) — 1/2. This makes 


Pr(X > d) = Pr(X > d) + Pr(X =d) = 1— F(d) + F(d) — 1/2 =1/2. 


Hence d is also a median 


(c) If X has a discrete distribution, then clearly /' must be discontinuous at d otherwise F(x) = 1/2 
even for some x > d and d would not be the right endpoint of A. 


We know that 1 = Pr(X < m)+Pr(X = m)+Pr(X > m). Since Pr(X < m) = Pr(X > m), 
both Pr(X < m) < 1/2 and Pr(X > m) < 1/2, otherwise their sum would be more than 1. Since 
Pr(X <m) < 1/2, Pr(X > m) =1—Pr(X <m) > 1/2. Similarly, PrX < m) =1—Pr(X > m) > 1/2. 
Hence m is a median. 


As in the previous problem, 1 = Pr(X < m)+Pr(X =m) + Pr(X >m). Since Pr(X < m) < 1/2 and 
Pr(X > m) < 1/2, we have Pr(X > m)=1-—Pr(X <m) > 1/2 and Pr(X < m) =1-—Pr(X > m) > 
1/2. Hence m is a median. Let k > m. Then Pr(X > k) < Pr(X > m) < 1/2, and k is not a median. 
Similarly, if k <_m, then Pr(X < k) < Pr(X < m) < 1/2, and k is not a median. So, m is the unique 
median. 


2 
< 


Let m be the p quantile of X, and let r be strictly increasing. Let Y = r(X) and let G(y) be the c.d-f. 
of Y while F(z) is the c.d.f. of X. Since Y < y if and only if r(X) < y if and only if X < r~'(y), we 
have G(y) = F(r~!(y)). The p quantile of Y is the smallest element of the set 


Cp = {y: G(y) > v} = {y: F(r*(y)) = p} = {r(a) : F(x) > p}. 


Also, m is the smallest x such that F(x) > p. Because r is strictly increasing, the smallest r(a) such 
that F(x) > p is r(m). Hence, r(m) is the smallest number in Cp. 
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4.6 Covariance and Correlation 


Solutions to Exercises 


1. The location of the circle makes no difference since it only affects the means of X and Y. So, we 
shall assume that the circle is centered at (0,0). As in Example 4.6.5, Cov(X,Y) = 0. It follows that 
p(X, Y) =0 also. 


2. We shall follow the hint given in this exercise. The relation [(X — x)+ (Y — py)]|? > 0 implies that 
(X = x)(¥ = pr) $ 5[(X = ux)? + = py) 
Similarly, the relation [(X — ux) — (Y — py)? = 0 implies that 
=(X = pwx)(¥ = py) < SIX = wx)? + (Y — wr? 
Hence, it follows that 
(X= wx)(¥ = wy)| S SIX — wx)? + = yh 
By taking expectations on both sides of this relation, we find that 


B|\(X — px)(¥ — wy)|] < =[Var(X) + Var(Y)] < oo. 


Notre 


Since the expectation on the left side of this relation is finite, it follows that 
Cov(X,Y) = E[(X — ux)(Y — py)| 


exists and is finite. 


3. Since the p.d.f. of X is symmetric with respect to 0, it follows that E(X) =0 and that E(X*) = 0 for 
every odd positive integer k. Therefore, E(XY) = E(X") =0. Since E(XY) =0 and E(X)E(Y) =0, 
it follows that Cov(X,Y) =0 and p(X, Y) =0. 


4. It follows from the assumption that 0 < E(X*) < oo, that 0 < 0% < oo and 0 < o? < oo. Hence, 
p(X,Y) is well defined. Since the distribution of X is symmetric with respect to 0, E(X) =0 and 
E(X°) =0. Therefore, E(XY) = E(X?) =0. It now follows that Cov(X,Y) = 0 and p(X,Y) =0. 


5. We have E(aX +b) =aux +b and E(cY +d) = cuy +d. Therefore, 


Cov(aX + b,cY +d) 


E|(aX + b— aux — b)(c¥Y +d— chy — d)| 
Elac(X — ux)(Y — py)] = acCov(X,Y). 


6. By Exercise 5, Cov(U, V) = acCov(X,Y). Also, Var(U) = a2o% and Var(V) = c?o?.. Hence 


ac Cov(X, Y X,Y) if ac>Q0O, 
(U,V) = 22024 af A ) 


lalox -|cloy )—p(X,Y) if ac<0. 
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7. We have E(aX + bY +c) = apx + buy +c. Therefore, 


Cov(aX + bY +c, Z) 


E|(aX + bY +c—apx — buy — c)(Z — z)] 
E{[a(X — px) + (¥ — py)|(Z — wz)} 

aE[(X — ux)(Z— wz)) + bE[(Y — py )(Z — pz)! 
a Cov(X, Z) + bCov(Y, Z). 


8. We have 
Cov (Sax Soo ) = E/) a(x xi) Dbl (¥5 — py; 
i=l 1=1 j=l 
E |S > ab; ux,)(V5 = 
i=1 j=1 
= VY adj | \(¥ - wy,)] 
i=1 j=1 
= S7 So aid Cov(X;, Y;) 
i=1 j=1 
9. Let U=X+Y andV =X —Y. Then 
B(OV)=E(X 4) —¥)| =} POC =¥*) = B= Er’). 
Also, 
B(U)E(V) = E(X + Y)E(X —Y) = (ux + py)(ux — wy) = 0 = pe. 
Therefore, 
Cov(U,V) = E(UV) — E(U)E(V) = [E(X*) — wx] — [E(Y?) — ny] 
Var(X) — Var(Y) = 0. 
It follows that p(U,V) = 0. 
10. Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y) and Var(X — Y) = Var(X 
Since Cov(X,Y) < 0, it follows that 
Var(X + Y) < Var(X —Y). 
11. For the given values, 
Var(X) BX?) =|£OO? =10=9 = 1, 
Wary) = 2") = (20) = 28 4 = 95, 
Cov(X, Y) E(XY)—- E(X)E(Y) =0-6=-6. 
Therefore, 
p(x.) = cox = —=, which is impossible 
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12. 
1 2 L 5 
BX) = [ fe xe +vdyar = =, 
o Jo 3 9 
1 2 1 11 
BY). = [ fu ze@t vader ==, 
0 JO 3 9 
E(x? ‘aes re ae 
(x?) = ff a? Se +y)dyde = =. 
1 2 1 1 
E(y*) = [fv Lee eee a 
0 JO 3 9 
1 2 1 2) 
E(XY) = [ fw ~(«+y)dydz = =. 
0 Jo 3 3 
Therefore, 
7 b\? 13 
16 f1l\? _ 23 
Vary). = +7 (5) = 37’ 
2 5) 11 1 
My) = StS 
eee) 3 (§) (5) 81 
It now follows that 
Var(2X —3Y +8) = 4Var(X) +9 Var(Y) — (2)(2)(3) Cov(X,Y) 
_ 25 
81° 
1 
13. Cov(X, Y) = p(X, Y)oxoy = ~ 5) =—1. 
(a) Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y) = 11. 
(b) Var(X — 3Y +4) = Var(X) + 9 Var(Y) — (2)(3) Cov(X, Y) = 51. 
14. (a) Var(X + Y + Z) = Var(X) + Var(Y) + Var(Z) + 2 Cov(X, Y) + 2 Cov(X, Z) + 2Cov(Y, Z) = 17. 
(b) Var( 


3X —Y —2Z+1) = 9 Var(X)+Var(Y)+4 Var(Z)—6 Cov(X, Y)—12 Cov(X, Z)+4 Cov(Y, Z) = 


15. Since each variance is equal to 1 and each covariance is equal to 1/4, 
Var(X1 ++: +X,) = x Var (X;) + -e > Cov(X;, X;) 
i i<j 


n(1) +2.0@ed (3) =n+ ee 


16. We need the cost to be 6000 dollars, so that 50s; + 30sg = 6000. We also need the variance to be 0. 
The variance of s;R, + s9Rp is 


st Var(R1) + s% Var(Rz) + 28152 Cov(R1, Rp). 
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The variances of R, and Rz are Var(R,) = 75 and Var(R2) = 17.52. Since the correlation between 
R, and Rp is —1, their covariance is —1(75)!/?(17.52)!/2 = —36.25. To make the variance 0, we need 
75s? + 17.5283 — 36.25s1s2 = 0. This equation can be rewritten (75!/2s,; — 17.52!/2s2)? = 0. So, we 
need to solve the two equations 


751/25, — 17.52/25. =0, and 50s, + 30s9 = 6000. 


The solution is sj = 53.54 and sg = 110.77. The reason that such a portfolio is unrealistic is that it 
has positive mean (1126.2) but zero variance, that is one can earn money with no risk. Such a “money 
pump” would surely dry up the moment anyone recognized it. 


Let ux = E(X) and wy = E(Y). Apply Theorem 4.6.2 with U = X — wx and V = Y — py. Then 
(4.6.4) becomes 


Cov(X,Y)? < Var(X) Var(Y). (S.4.3) 


Now |p(X,Y)| = 1 is equivalent to equality in (S.4.3). According to Theorem 4.6.2, we get equality 
in (4.6.4) and (S.4.3) if and only if there exist constants a and b such that aU + bV = 0, that is 
a(X — px) +b(Y — py) =0, with probability 1. So |p(X,Y)| = 1 implies aX + bY = aux = buy with 
probability 1. 


The means of X and Y are the same since f(x,y) = f(y,«) for all z and y. The mean of X (and the 
mean of Y) is 


a a Bl 1 1 7 
BX) = ff oe + ydedy = [ (5 +4) dy=54+ 5-5. 


Also, 


1 fl 1 2 
y i. a i 


So, 


1 2 
Cov(X,¥) = 5 - (=) = —0.00695. 


4.7 Conditional Expectation 


Solutions to Exercises 


if 


The M.S.E. after observing X = 18 is Var(P|18) = 19 x (41 — 18)/[42? x 43] = 0.00576. This is about 
seven percent of the marginal M.S.E. 


. If X denotes the score of the selected student, then 


E(X) = E[E(X | School)] = (0.2)(80) + (0.3)(76) + (0.5)(84) = 80.8. 


. Since E(X | Y) =c, then E(X) = E[E(X | Y)] =c and 


E(XY) = E[E(XY | Y)| = E(VE(X | Y)] = E(cY) = cE(Y). 


Therefore, 


Cov(X,Y) = E(XY) — E(X)E(Y) =cE(Y) —cE(Y) =0. 
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4, Since X is symmetric with respect to 0, E(X*) = 0 for every odd integer k. Therefore, 
POPPY) = Fe e'"’Y |X) = Bixee |X = Blake ok | See. 
Also, 
E(Y) = E[E(Y | X)] = E(axX +6) = 6. 
It follows that 
Cov(A™, Y) = BOCPTY) — EOP EY) = bee) — abe) =0. 
5. For any given value x,_1 of Xn_-1, E(Xy | 2n—1) will be the midpoint of the interval (z,_1, 1). Therefore, 


eo at 
E(Xn | Xn-1) = aaa ieee 


It follows that 


1 1 1 
Similarly, E(Xp,-1) = 5 + gE (Xn-2); etc. Since F(X 1) = 5° we obtain 


i 4 4 1 1 
BOS age ite lp. 


6. The joint p.d.f. of X and Y is 


je for a?+y? <1, 
Fla,y) = it otherwise. 


Therefore, for any given value of y in the interval —1 < y < 1, the conditional p.d.f. of X given that 
Y =y will be of the form 


fa(y) 


0 otherwise. 


a. Belt aae oe 
rely = Het | Fa or —Vl—-yr<a<vVl-y’, 


For each given value of y, this conditional p.d.f. is a constant over an interval of values of « symmetric 
with respect to z = 0. Therefore, E(X | y) = 0 for each value of y. 


7. The marginal p.d.f. of X is 


Therefore, for 0 < x < 1, the conditional p.d.f. of Y given that X = x is 


oy | t) = ry Oe for0<y<1. 
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10. 


i! 7 1 11 
. The prediction is E (y [A= 5) 5 and the M.S.E. is Var (v [x= 5) = — 


Chapter 4. Expectation 


Hence, 
1 2(ry +y*) 3x +2 
E(Y = = 
el) [- ~Qe+1 " 3(Qe+1)’ 
1 2(xy? + y*) + y°) 4x +3 
BY? -{ eg 
ee) 0 2e+1 ” 6Qr+1) 
and 
Ag +3 S42 7? 4 1 
Var(Y | z) = ————~ —- |_|] = — ]3 - ——__ |. 
an Le) aa. Th soscp| | oF 


2 144° 


. The overall M.S.E. is 


1 
ElVar(Y | X)] = i} x b- aap fends 


It was found in the solution of Exercise 7 that 
1 
A(@)=r+5 for0<a<1. 


1 __ log 3 


Therefore, it can be found that E[Var(Y | X)] = a a 


i oh 
It was found in Exercise 9 that when Y is predicted from X, the overall M.S.E. is —— = 


12 
the total loss would be 


. Therefore, 


12 144 


If Y is predicted without using X, the M.S.E. is Var(Y). It is found that 


i 1 7 
=f f we+u)deay = 5 


and 


5 
y? (a + y)dx Y= 75 


cy _ di 
Hence, Var(Y) = 3 = (=) - Tat’ The total loss when X is used for predicting Y will be less than 


. ; log 3-1 
Var(Y) if and only if ¢< a 
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Let E(Y) = py. Then 


Var(Y) E\(Y — py)?] = E{[(¥ — E(¥ | X)) + (E(¥ | X) — py)/?} 
E{[Y — E(Y | X)?} + 2B{[¥ — E(Y | X)|[E(¥ | X) — ny]} 


+E{[E(Y | X) — py]’}. 


We shall now consider further each of the three expectations in the final sum. First, 
B{(Y — E(Y | X)P°} = (EY — BUY | XP? | X}) = BlVar(¥ |X). 
Next, 


E{[Y —E(Y | X)[EW | X)—py]} = E(EIY — EY | X)[EW | X) — wy] | X}) 
= E(\E(Y | X)— py]E{Y — Ey | X) | X}) 
= E(lE(Y | X) — py] -9) 
= 0. 


Finally, since the mean of E(Y | X) is E[E(Y | X)] = py, we have 
E{{E(Y | X) — py]?} = Var[E(Y | X)]. 

It now follows that 
Var(Y) = E[Var(Y | X)] + Var[E(Y | X)]. 


Since E(Y) = E|E(Y | X)], then 


Also, as found in Example 4.7.7, 
E(XY) = aE(X*) + bE(X). 
By solving these two equations simultaneously for a and b we obtain, 


_ E(XY)- E(X)E(Y) _ Cov(X,Y) 
O* E(X2)— [ECO Var(X) 


and 
b= E(Y) —akE(X). 
(a) The prediction is the mean of Y: 


Th oy. 9 3 
BY) = | | y (2x + 3y)da dy = ~. 
0 Jo 
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(b) The prediction is the median m of X. The marginal p.d.f. of X is 
ce} 1 
Al) = [ 5 (20 + 3y)dy = = (4x + 3) for 0 <@ < 1. 
0 


We must have 


med, 1 
| =(4a + 3)dx = =. 
0 2 


5 
Therefore, 4m? + 6m —5 = 0 and m= a. 


14. First, 


1 fil ») 1 
E(XY)= 7 | ry: =(2x + 3y)dx dy = =. 
0 Jo 5 3 
Next, the marginal p.d.f. f; of X was found in Exercise 13(b). Therefore, 


17 


E(X) = i rf (e)de = =. 


Furthermore, it was found in Exercise 13(a) that E(Y) = 3/5. It follows that Cov(X,Y) = 1/3 - 
(17/30)(3/5) = 17/51 — 17/50 < 0. Therefore, X and Y are negatively correlated. 


15. (a) ForO<a2<1land0<y <1, the conditional p.d.f. of Y given that X = x is 


_ f(x,y) _ 22x + 3y) 
Hy |) = filt) = 4a +3 


When X = 0.8, the prediction of Y is 


ny |X =0.8) = [ yoly| x =0.8)dy = f 


(b) The marginal p.d.f. of Y is 


1 y(1.6 + 3y) 18 
3.1 i 


2 2 
AGi= | =(20 + 3y)dx==(1+3y) for Sy <1. 
0 
Therefore, forO <2 <land0<y <1, the conditional p.d.f. of X given that Y = y is 


_ Peg) a8 3 
egy Lay 


When Y = 1/3, the prediction of X is the median m of the conditional p.d.f. h(a | y = 1/3). We 
must have 


pee 1 
GSS 
0 2 2 


Hence, m? +m = 1 and m = (v5 — 1)/2. 


16. Rather than repeat the entire proof of Theorem 4.7.3 with the necessary changes, we shall merely point 
out what changes need to be made. Let d(X) be a conditional median of Y given X. Replace all 
squared differences by absolute differences. For example [Y — d(X)]? becomes |Y — d(X)|, [Y — d*(x)]? 
becomes |Y — d*(x)|, and so on. When we refer to Sec. 4.5 near the end of the proof, replace each 
“M.S.E.” by “M.A.E.” and replace the word “mean” by “median” each time it appears in the last four 
sentences. 
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17. Let Z = r(X,Y), and let (X,Y) have joint p.f. f(x,y). Also, let W = r(xo, Y), for some possible value 
xo of X. We need to show that the conditional p.f. of Z given X = xo is the same as the conditional 
p.f. of W given X = 29 for all zo. 


Let fi(x) be the marginal p.f. of X. For each possible value (z,x) of (Z,X), define By) = {y : 
r(z,y) =z}. Then, (Z,X) = (z, x) if and only if X =z and Y € Bizz). The joint p-f. of (Z, X) is then 


g(z,z)= D> f(x,y). 


YEBe,x) 


The conditional p.f. of Z given X = 20 is gi(z|%0) = g(z,20)/fi(xo) for all z and all zo. 


Next, notice that (W,X) = (w,2) if and only if X = x and w € Byy,g,). The joint p.f. of (W, X) is 
then 


h(w,z)= Dd) f(x,y): 


YEBw,x9) 


The conditional p.f. of W given X = x is hi(w|x) = h(w, x)/fi(x). Now, for x = xo, we get hi(w|xo) = 
h(w,xo)/fi(ao). But h(w, 20) = g(w, xo) for all w and all xp. Hence hy(w|xo0) = gi(w|xo) for all w and 
all x9. This is the desired conclusion. 


4.8 Utility 


Commentary 


It is interesting to have the students in the class determine their own utility functions for any possible gain 
between, say, 0 dollars and 100 dollars; in the other words, to have each student determine their own function 
U(«) for 0 < x < 100. One method for determining various points on a person’s utility function is as follows: 

First, notice that if U(a) is a person’s utility function, then the function V(2) = aU(«) + b, where a and 
6 are constants with a > 0, could also be used as the person’s utility function, because for any two gambles X 
and Y, we will have E[U(X)] > E[U(Y)] if and only if E[V(X)] = aE[U(X)] +60 > E[V(Y)] = aE[U(Y)] +0. 
Therefore, the function V reflects exactly the same preferences as U. The effect of being able to transform 
a person’s utility function in this way by choosing any constants a > 0 and 0 is that we can arbitrarily fix 
the values of the person’s utility function at the two points x = 0 and x = 100, as long as we use values such 
that U(0) < U(100). For convenience, we shall assume that U(0) = 0 and U(100) = 100. 

Now determine a value x; such that the person is indifferent between accepting a gamble from which the 
gain will be either 100 dollars with probability 1/2 or 0 dollars with probability 1/2 and accepting x, dollars 
as a sure thing. For this value 71, we must have 


1 1 1 1 
U(X) = 5U (0) + 5U (100) = 5: 0+5 100 =50. 


Hence, U(x) = 50. 

Next, we might determine a value x2 such that the person is indifferent between accepting a gamble from 
which the gain will be either x; dollars with probability 1/2 or 0 dollars with probability 1/2 and accepting 
x dollars as a sure thing. For this value 72, we must have 


1 1 1 1 
U(a2) = ZU (a1) + 5U(0) = 5-504 5 
Hence, U(x) = 25. 
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Similarly, we can determine a value x3 such that the person is indifferent between accepting a gamble 
from which the gain will be either x; dollars with probability 1/2 or 100 dollars with probability 1/2 and 
accepting x3 dollars as a sure thing. For this value x3, we must have 


1 1 i 1 
U(x3) = 3U (#1) + 5U (100) = 5° 50+5:100=75. 


Hence, U(x3) = 75. 

By continuing in this way, arbitrarily many points on a person’s utility function can be determined and 
the curve U(x) for 0 < x < 100 can then be sketched. The difficulty is in having the person determine the 
values of 11, 2%2,2%3, etc., honestly and accurately in a hypothetical situation where he will not actually have 
to gamble. For this reason, it is necessary to check and recheck the values that are determined. For example, 
since 


iei= se 5U (2) rn 5U (8) 


the person should be indifferent between accepting x; dollars as a sure thing and accepting a gamble from 
which the gain will be either x2 dollars with probability 1/2 or x3 dollars with probability 1/2. By repeat- 
edly carrying out checks of this type and allowing the person to adjust his answers, a reasonably accurate 
representation of a person’s utility function can usually be obtained. 


Solutions to Exercises 


1. The utility of not buying the ticket is U(0) = 0. If the decision maker buys the ticket, the utility is 
U(499) if the ticket is a winner and and U(—1) if the ticket is a loser. That is the utility is 499° with 
probability 0.001 and it is —1 with probability 0.999. The expected utility is then 0.001 x 499° — 0.999. 
The decision maker prefers buying the ticket if this expected utility is greater than 0. Setting the 
expected utility greater than 0 means 499° > 999. Taking logarithms of both sides yields a > 1.11. 


2. 
1 2 1 2 
E[U(X)] = 5-5" + 5: 25° = 325, 
1 
E(U(Y)] = 5° 10 20" = 250. 
E[U(Z)| = 157 = 225 
Hence, X is preferred. 
5. 


E(U(X)] = 5vB+ 5VB = 3.618, 


E[U(Y)] = svi + 5v0 = 3.817, 
E[U(Z)] = V15 = 3.873. 


Hence, Z is preferred. 
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4. For any gamble X, E[U(X)| = aE(X) +0. Therefore, among any set of gambles, the one for which the 
expected gain is largest will be preferred. We have 


il it 
E(X)==-5+—-25=15 
(X)=5-5+5 
2j=— 10+: 20 = 15 

~ o > ei 
E(Z) = 15. 


Hence, all three gambles are equally preferred. 
5. Since the person is indifferent between the gamble and the sure thing, 


2 2 
-O+--1=-. 


1 1 
3 3 3 


U(50) = 3U (0) + 50 (100) = 
6. Since the person is indifferent between X and Y, E|U(X)] = E[U(Y)]. Therefore, 
(0.6)U(—1) + (0.2)U(0) + (0.2)U(2) = (0.9)U(0) + (0.1)U(1). 
It follows from the given values that U(—1) = 23/6. 
7. For any given values of a, 
E[U(X)] = plog a+ (1 — p) log(1 — a). 
The maximum of this expected utility can be found by elementary differentiation. We have 


OE U(X)]_p_1=p 


Oa a l-a 
When this derivative is set equal to 0, we find that a = p. Since 


PEUX) _ Pp _ _1-p _y 
Oa? a2 (1—a)? , 


It follows that E[U(X)] is a maximum when a = p. 
8. For any given value of a, 
E[U(X)] = pa? + (1—p)(1— a)". 
Therefore, 


OE[U(X)]__p__ _ii=p 
Ja Qat/2 (1 — a)1/2° 


When this derivative is set equal to 0, we find that 


? E[U(X)] 


<0, it follows that E[U(X)] is a maximum at this value of a. 
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fo) 
a 


eS ee 


fo) 


(ii) 


Figure $.4.2: Figure for Exercise 9 of Sec. 4.8. 


9. For any given value of a, 
E{U(X)] = pa + (1 — p)(1 — a). 


This is a linear function of a. If p < 1/2, it has the form shown in sketch (i) of Fig. 5.4.2. 


Therefore, E[U(X)] is a maximum when a = 0. If p > 1/2, it has the form shown in sketch (ii) of 
Fig. $.4.2. Therefore, E[U(X)] is a maximum when a = 1. If p = 1/2, then E[U(X)] = 1/2 for all 
values of a. 


10. The person will prefer X3 to X4 if and only if 


E|U(X3)| = (0.3)U(0) + (0.3)U(1) + (0.4)U(2) > E[U(X4)] 
= (0.5)U(0) + (0.5)U(2). 


Therefore, the person will prefer X3 to X4 if and only if 
(0.2)U(0) — (0.3)U(1) + (0.1)U(2) < 0. 
Since the person prefers X, to X2, we know that 


B[U(X1)] = (0.2)U(0) + (0.5)U(1) + (0.3)U(2) > EIU (X)] 


which implies that 
(0.2)U(0) — (0.3)U(1) + (0.1)U(2) < 0. 
This is precisely the inequality which was needed to conclude that the person will prefer X3 to X4. 
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For any given value of b, 
B[U(X)] = plog(A + 6) + (1 — p) log(A — 8). 
Therefore, 


OEW(X)]__p_ i=p 
Ob A+b A-b 


When this derivative is set equal to 0, we find that 
b= (2p—1)A. 


PEUX 
Since ae < 0, this value of b does yield a maximum value of E[U(X)]. If p > 1/2, this value of 


b lies between 0 and A as required. However, if p < 1/2, this value of b is negative and not permissible. 
In this case, it can be shown that the maximum value of E[U(X)] for 0 < b < A occurs when 6 = 0; 
that is, when the person does not bet at all. 


For any given value of b, 


E[U(X)] = p(A +b)? + (1 = p)(A— B)'?2, 
Therefore, 


OE[U(X)] _ Pp ee ont 
Ob A+ b)1/2  -2(A — )1/2° 


When this derivative is set equal to 0, we find that 
ap (lp 
p?+(1—p)? 


As in Exercise 11, if p > 1/2, then this value of b lies in the interval 0 < b < A and will maximize 
E|U(X)]. However, if p < 1/2, the value of b in the interval 0 < b < A for which E[U(X)] is a maximum 
isb=0. 


b A. 


For any given value of b, 


E[U(X)] = p(A + 6) + (1 — p)(A — 8). 


This is a linear function of b. If p > 1/2, it has the form shown in sketch (i) of Fig. $.4.3 and b= A 
is best. If p < 1/2, it has the form shown in sketch (ii) of Fig. $.4.3 and b = 0 is best. If p = 1/2, 
E|U(X)] =A for all values of b. 


For any given value of b, 
B[U(X)] = p(A +5)? + (1 — p)(A — 0). 


This is a parabola in b. If p > 1/2, it has the shape shown in sketch (i) of Fig. $.4.4. Therefore, 
E|U(X)] is a maximum for b = A. If 1/4 < p< 1/2, it has the shape shown in sketch (ii) of Fig. $.4.4. 
Therefore, E[U(X)] is again a maximum for b= A. If 0 < p < 1/4, it has the shape shown in sketch 
(iii) of Fig. $.4.4. Therefore, E[U(X)] is a maximum for b = 0. Finally, if p = 1/4, then it is symmetric 
with respect to the point b = A/2, as shown in sketch (iv) of Fig. $.4.4. Therefore, E[U(X)] is a 
maximum for b= 0 and b= A. 
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(i) (ii) 


Figure $.4.3: Figure for Exercise 13 of Sec. 4.8. 


15. The expected utility for the lottery ticket is 


aux) = fo 2tdn = 
= 2 ae 


The utility of accepting zo dollars instead of the lottery ticket is U(xo) = xf. Therefore, the person 
will prefer to sell the lottery ticket for x9 dollars if 


a 


fol or if wn Ce, doe 


(e7 
> 
zg > a 
It can be shown that the right-hand side of this last inequality is an increasing function of a. 
16. The expected utility from choosing the prediction d is 
ElU(-[Y — d)?) = E(\Y —d)). 
We already saw (in Sec. 4.5) that d equal to a median of the distribution of Y minimizes this expectation. 


17. The gain is 10° if P > 1/2 and —10° if P < 1/2. The utility of continuing to promote is then 10°+ 
if P > 1/2 and —10° if P < 1/2. To find the expected utility, we need Pr(P < 1/2). Using the 


1/2 
stated p.d.f. for P, we get Pr(P < 1/2) = / 56p°(1 — p)dp = 0.03516. The expected utility is then 
0 


10°-+ x (1 — 0.03516) — 10° x 0.03516 = 207197. This is greater than 0, so we would continue to promote 
the treatment. 


4.9 Supplementary Exercises 


Solutions to Exercises 


1. Ifu>0, 


7 are dae uf f(z)dx = ull — F(wu)]. 
Since 
Jim ef(ejde= EX) = i ai (oda < co; 


it follows that 


lim E(x) — i: xf(c)da| = tim, [~ ets d= 0), 


U—00 = 
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(i) 
I 
I 
I 
I 
| 
I 
! 
) A b 
(ii) 
| 
| 
| 
| 
| 
0 Ab ) Al2 A b 
(iii) (iv) 


Figure §$.4.4: Figure for Exercise 14 of Sec. 4.8. 


2. We use integration by parts. Let u = 1— F(a) and dv = dz. Then du = —f(x)dx and v = zg, and the 
integral given in this exercise becomes 


[uv]o° — I udu = [ efeae = EX): 


3. Let 21,2%2,... denote the possible values of X. Since F(X) is a step function, the integral given in 
Exercise 1 becomes the following sum: 


(a; = 0) + [1 = f (ei) (eo = ai) + [Ll — fe) — fea) es — 22) 
wif (ai) + zo f (eo) + a3 f (a3) ++: 
— F(X). 


4. If X, Y, and Z each had the required uniform distribution, then 


— 


H(X+¥ +2) =B(X) +BY) + H(2) = 545455. 


But since X + Y + Z < 1.3, this is impossible. 
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5. We need E(Y) = au+b=0 and 


Var(Y) = a2o? = 1. 


Therefore, a= + 4 and b = —aw. 
6. The p.d.f. hi(w) of the range W is given at the end of Sec. 3.9. Therefore, 


n—-1 
n+l 


E(W =n(n—1) fw? (1 — w)dw = 


7. The dealer’s expected gain is 


to 79 3 
BY -X)=55f [wy —ejededy = 5. 


8. It follows from Sec. 3.9 that the p.d.f. of Y;, is 
only) = n[F(y)]"~* f(y): 
Here, F'(y )=[" Qedx = y’”, 


gn(y) =2ny?""! for O<y <1. 


2n 
In +1 


Hence, E(Yn) = - y In(y)dy = 
9. Suppose first that r(X) is nondecreasing. Then 
Pr[¥ > r(m)] = Pr[r(X) > r(m)] > Pr(X > m) > 5, 
and 
Pr[Y < r(m)] = Pr[r(X) <r(m)] > Pr(X <m) > 
Hence, r(m) is a median of the distribution of Y. If r(X) is nonincreasing, then 


PY Srl xX SS 5 
and 


Pr¥ < r(m)] > Pr(X >m) > > 


10. Since m is the median of a continuous distribution, 


1 
Pri <a) —]Prix > im) = 5 Hence, 
Pr(Y¥, >m) = 1—Pr(All Xj s<™m) 
1 
= fo 
Qn 
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Suppose that you order s liters. If the demand is x < s, you will make a profit of gx cents on the x 
liters sold and suffer a loss of c(s — x) cents on the s — = liters that you do not sell. Therefore, your 
net profit will be gz — c(s — x) =(g+c)x —cs. If the demand is x > s, then you will sell all s liters 
and make a profit of gs cents. Hence, your expected net gain is 


E 


[Mg +oe-eslf@dz + 9s f° fede 


[ (g+c)«x f(x)dz — csF(s) + gs|1 — F(s)]. 


To find the value of s that maximizes E, we find, after some calculations, that 


= 9-(9 +0) F(s), 


Thus, 4 = 0 and E is maximized when s is chosen so that F(s) = g/(g +c). 


Suppose that you return at time t. If the machine has failed at time x < t, then your cost is c(t — x). 
If the machine has net yet failed (a >t), then your cost is b. Therefore, your expected cost is 


E= is c(t — x) f(x)dx + bf f(«)dx = ctF(t) — ef xf (x)dz + b[l — F(t)]. 


Hence, 


dE 


< = F(t) - bf). 


and F will be maximized at a time t such that cF(t) = bf (t). 
E(Z) = 5(3) — 1+ 15 = 29 in all three parts of this exercise. Also, 

Var(Z) = 25 Var(X) + Var(Y) — 10 Cov(X, Y) = 109 — 10 Cov(X, Y). 
Hence, Var(Z) = 109 in parts (a) and (b). In part (c), 

Cov(X,Y) = poxoy = (.25)(2)(3) = 1.5 


so Var(Z) = 94. 


n 
In this exercise, iS Yj = Ln — Zo. Therefore, 
j=l 


Var(Yn) = = Var(Xn — Xp). 
Since X,, and Xqg are independent, 

Var(Xy, — Xo) = Var(X;,) + Var(Xo). 
_ 20? 


Hence, Var(Yn) = ou. 
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Let v? = Var(X, +---+ Xn) = > Var(X;) + 2 Cov(X;, Xj). In this problem Var(X;) = 0? for all i 
F i<j 
and Cov(X;, X;) = po? for alli 4 j. Therefore, 


v* = no* + n(n — 1)po?. 


Since v? > 0, it follows that p > —1/(n — 1). 


Since the correlation is unaffected by a translation of the distribution of X and Y in the zxy-plane, 
we can assume without loss of generality that the origin is at the center of the rectangle. Hence, by 
symmetry, E(X) = E(Y) = 0. But it also follows from symmetry that E(XY) = 0 because, for any 
positive value of XY in the first or third quadrant, there is a corresponding negative value in the second 
or fourth quadrant with the same constant density. Thus, Cov(X,Y) = 0 and p(X,Y) =0. 


More directly, one can argue that the joint p.d.f. of (X,Y) factors into constants times the indicator 
functions of the two intervals that define the sides of the rectangles, hence X and Y are independent 
and uncorrelated. 


For 7=1,...,n, let X; = 1 if the ith letter is placed in the correct envelope and let X; = 0 otherwise. 
Then £(X;) = 1/n and, fori 4 J, 


1 
E(X;X;) = Pr( XX; => 1) = Pra; =1 and Xj = 1) = ee 
n(n — 1) 
Also, E(X?) = E(X;) =1/n. Hence, 
1 1 n—-1 
ge a 
1 1 1 ; n 
and Cov(X;, Xj) = A) ae ae faa The total number of correct matches is X = )7j_1 Xj. 
Therefore, 
= n-1 I 
Var(X) = S > Var(X;) + oy Cov(X;, Xj) =n-—,-+n(n—-1)- nat) =A, 
; ae n n*(n — 
i=1 VJ 


E((X — p)*] = E(X*) — 3uE(X?) + 3y°E(X) - pe? 
= E(X*) — 3u(0? + pw?) + 3y3 — 
= B(x") = Sie — a. 
YO ang ety) = vow" = WO? 


7 OP 
Since ~(0) =1, (0) =p, and w"(0) = E(X?) = 07 + p?, it follows that c(0) = uw and ¢’(0) = 07. 


It was shown in Exercise 12 of Sec. 4.7 that if E(Y | X) = aX +b}, then 


_ Cov(X,Y) _ poy 


Var(X) Ox 


and b = py — ajtx. The desired result now follows immediately. 
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Since the coefficient of X in E(Y | X) is negative, it follows from Exercise 20 that p < 0. Furthermore, 
it follows from Exercise 20 that the product of the coefficients of X and Y in E(Y | X) and E(X | Y) 
must be p”. Hence, p? = 1/4 and, since p < 0, p = —1/2. 


Let X and Y denote the lengths of the longer and shorter pieces, respectively. Since Y = 3 — X with 
probability 1, it follows that p = —1. 


Cov(X,X +bY) = Var(X) + bCov(X,Y) 
=. L=pbp, 
Var(X) = 1,Var(X + bY) =14 07 +4 2bp. 
Hence, 
1+ bp 


X,X + bY) = ——.——_.... 
eee! ak ee ES Ie 


If we set this quantity equal to p, square both sides, and solve for b, we obtain b = —1/(2p). 


The p.f. of the distribution of employees is 
f(0) =.1, f(1) =.2, f(3) =.3, and f(5)= 4 


(a) The unique median of this distribution is 3, so the new office should be located at the point 3. 


(b) The mean of this distribution is (.1)(0) + (.2)(1) + (.3)(3) + (.4)(5) = 3.1, so the new office should 
be located at the point 3.1. 


(a) The marginal p.d.f. of X is 
x 
fie) = | 8xy dy = 42° forO<a<l. 
0 
Therefore, the conditional p.d.f. of Y given that X = .2 is 


gly |X =.2) = Ss 


The mean of this distribution is 


2 
E(Y | X = .2) = = = .1333. 


1/2 
(b) The median of gi(y | X = .2) is m= (+) es 1414. 


=50y for 0<y<.2. 


Cov(X, Y) = E[(X — px)(Y — py)] 
= E{[X—E(X | Z)+E(X | Z)—px]-[Y - EY | Z)+ E(Y | Z) — py]} 
E{[X — E(X | Z)[Y — E(Y | Z)|} + E{[X — E(X | Z)[EW | Z) — wy ]} 
+E{[E(X | Z) — ux][¥ — EY | Z))} + EEX | 2) — ex][E | Z) — wy}. 
Consider these final four expectations. In the first one, if we first calculate the conditional expectation 
given Z and then take the expectation over Z we obtain E[Cov(X,Y | Z)]. In the second and third 


expectations, we obtain the value 0 when we take the conditional expectation given Z. The fourth 
expectation is Cov[E(X | Z), E(Y | Z)]. 
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27. Let N be the number of balls in the box. Since the proportion of red balls is p, there are Np red balls 


28. 


29. 


in the box. (Clearly, p must be an integer multiple of 1/N.) There are N(1 — p) blue balls in the box. 
Let K = Np so that there are N — K blue balls and K red balls. If n > K, then Pr(Y = n) = 0 since 
there are not enough red balls. Since Pr(X =n) > 0 for all n, the result is true ifn > K. Forn< K, 
let X; = 1 if the zth ball is red for i=1,...,n. For sampling without replacement, 


Pr(Y =n) = Pr(X, = 1) J [ Pr(X = 1X4 = 1,..., Xj. = 1) = ———_.--- rp (S.4.4) 
1=2 


For sampling with replacement, the X;’s are independent, so 


ee eee ea be en (=). (8.4.5) 
i=1 


For j =1,....n-1, KN-JjN < KN—-jK, so (K —j)/(N —37) < K/N. Hence the product in (8.4.4) 
is smaller than the product in ($.4.5). This argument makes sense only if N is finite. If N is infinite, 
then sampling with and without replacement are equivalent. 


The expected utility from the gamble X is E[U(X)] = E(X?). The utility of receiving E(X) is 
U[E(X)] = [E(X)]?.. We know from Theorem 4.3.1 that E(X?) > [E(X)]? for any gamble X, and 
from Theorem 4.3.3 that there is strict inequality unless X is actually constant with probability 1. 


The expected utility from allocating the amounts a and m — a is 


E 


plog(gia) + (1 — p) log|[ga(m — a)| 
plog a+(1-—p)log(m — a) 
plog 9. +(1—p)log ge. 


+ 


The maximum over all values of a can now be found by elementary differentiation, as in Exercise 7 of 
Sec.4.8, and we obtain a = pm. 
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Special Distributions 


5.2 The Bernoulli and Binomial Distributions 


Commentary 


If one is using the statistical software R, then the functions dbinom, pbinom, and qbinom give the p.f., the 
c.d.f., and the quantile function of binomial distributions. The syntax is that the first argument is the 
argument of the function, and the next two are n and p respectively. The function rbinom gives a random 
sample of binomial random variables. The first argument is how many you want, and the next two are n and 
p. All of the solutions that require the calculation of binomial probabilites can be done using these functions 
instead of tables. 


Solutions to Exercises 


1. Since E(X*) has the same value for every positive integer k, we might try to find a random variable 
X such that X, X?, X3, X‘,...all have the same distribution. If X can take only the values 0 and 1, 
then X* = X for every positive integer k since 0* = 0 and 1* = 1. If Pr(X =1) =p=1-—Pr(X =0), 
then in order for E(X*) = 1/3, as required, we must have p= 1/3. Therefore, a random variable X 
such that Pr(X = 1) = 1/3 and Pr(X = 0) = 2/3 satisfies the required conditions. 


2. We wish to express f(x) in the form p*‘)(1 — p)®@), where a(x) = 1 and 6(x) = 0 and x = a and 
a(x) = 0 and f(x) = 1 for x = b. If we choose a(x) and {(x) to be linear functions of the form 
a(x) = a, + a9 and B(x) = 6; + fox, then the following two pairs of equations must be satisfied: 


ajtasa = 1 
aj, t+a2p = 0, 


and 
Pi + Poa = 0 
Pi+Pob = 1 
Hence, 
_ b ee | 
a a—b’ aa ae 
a 1 
= = Hae 
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3. Let X be the number of heads obtained. Then strictly more heads than tails are obtained if X € 
{6,7,8,9,10}. The probability of this event is the sum of the numbers in the binomial table corre- 
sponding to p = 0.5 and n = 10 for k = 6,...,10. By the symmetry of this binomial distribution, we 
can also compute the sum as (1 — Pr(X = 5))/2 = (1 — 0.2461) /2 = 0.37695. 


4. It is found from a table of the binomial distribution with parameters n = 15 and p = 0.4 that 


Pr(6<X <9) = Pr(X =6)4+Pr(X =7)+Pr(X = 8) + Pr(X = 9) 
.2066 + .1771 + .1181 + .0612 = .5630. 


5. The tables do not include the value p = 0.6, so we must use the trick described in Exercise 7 of 
Sec. 3.1. The number of tails X will have the binomial distribution with parameters n = 9 and p = 0.4. 
Therefore, 


Pr(Even number of heads) = Pr(Odd number of tails) 

= Pr(X =1)+Pr(X = 3)+Pr(X =5) 4+ Pr(X =7) + Pr(X = 9) 
.0605 + .2508 + .1672 + .0212 + .0003 
9000. 


6. Let N4, Ng, and No denote the number of times each man hits the target. Then 


E(Na+Ne+No) = E(Na)+ E(Ng)+ E(No) 
= 3 de 1 2 a 
= 8 4 7 


7. If we assume that N4, Ng, and No are independent, then 


Var(Na+Ne+Nc) = Var(Na) + Var(Ng) + Var(Nc) 
1 3 1 1 113 


nee Oe eS 4 aye Se 
s 4 at 2 2 64 


1 7 
p25 
8 gt 


8. The number X of components that fail will have the binomial distribution with parameters n = 10 and 
p=0.2. Therefore, 


persons = B25 awe 


Pr(X > 1) 1—Pr(X = 0) 
eee =. 6242 

ep soc aa er 
1 — .1074 .8926 


i=1 1=2 


i=l 


Pr (2 =1 and y= 8} Pr (x1 =1 and yx =k-1) 


9. Pr (x = 
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Since the random variables Xj,..., X, are independent, it follows that X, and }7>"_5 X; are independent. 
Therefore, the final expression can be rewritten as 


Pr(Xy = 1) Pr (32x =k—- : 


1=2 


The sum )7/_ X; has the binomial distribution with parameters n —1 and p, and the sum )7/_, X; has 
the binomial distribution with parameters n and p. Therefore, 


Pr (>: X; =k-— 7 — (a i) pd) = ( - i) a ian 


1=2 


Pr (>: A= 7 - (Jota aa", 
i=1 


Also, Pr(X; = 1) = p. It now follows that 


and 


(Jota —p)r* 


The number of children X in the family who will inherit the disease has the binomial distribution with 
parameters n and p. Let f(z|n,p) denote the p.f. of this distribution. Then 


Pr (x =1 


i=1 


Pr(X > 1) =1—Pr(X =0) =1- f(0|n,p) =1—- (1—p)”. 


For c= 1,2,...,n 
Prix =2|X 21) = 


Therefore, the conditional p.f. of X given that X > 1is f(x|n,p)/(1—[1—p]”) for r =1,2,...,n. The 
required expectation E(X | X > 1) is the mean of this conditional distribution. Therefore, 


r=1 
However, we know that the mean of the binomial distribution is np; i.e., 


n 
E(X) = Do af(x|n,p) = np. 
«z=0 
Furthermore, we can drop the term corresponding to « = 0 from this summation without affecting the 
value of the summation, because the value of that term is 0. Hence, s xf (a|n,p) = np. It now follows 
that E(X|X > 1) =np/(1—[1— pl)”. a 
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Since the value of the term being summed here will be 0 for « = O and for x = 1, we may change 
the lower limit of the summation from x = 2 to x = 0, without affecting the value of the sum. The 
summation can then be rewritten as 


3 ig @uc =p)? — . zt ("ora mio 
«2=0 «z=0 


If X has the binomial distribution with parameters n and p, then the first summation is simply E(X?) 
and the second summation is simply E(X). Finally, 


E(X*) — E(X) = Var(X) + [E(X))? — E(X) = np(1 — p) + (np)? — np = n(n — 1)p?. 


Assuming that p is not 0 or 1, 


Therefore, 


f(@+1|n,p) 


>1 if and only ifx < (n+1)p-1. 
f(x|n,p) 


It follows from this relation that the values of f(x|n,p) will increase as x increases from 0 up to the 
greatest integer less than (n+ 1)p, and will then decrease as x continues increasing up to n. Therefore, 
if (n+ 1)p is not an integer, the unique mode will be the greatest integer less than (n+1)p. If (n+ 1)p 
is an integer, then both (n+ 1)p and (n+ 1)p—1 are modes. If p = 0, the mode is 0 and if p = 1, the 
mode is n. 


Let X be the number of successes in the group with probability 0.5 of success. Let Y be the number 
of successes in the group with probability 0.6 of success. We want Pr(X > Y). Both X and Y have 
discrete (binomial) distributions with possible values 0,...,5. There are 36 possible (X,Y) pairs and 
we need the sum of the probabilities of the 21 of them for which X > Y. To save time, we shall calculate 
the probabilities of the 15 other ones and subtract the total from 1. Since X and Y are independent, 
we can write Pr(X = 2,Y = y) = Pr(X = 2x) Pr(Y = y), and find each of the factors in the binomial 
table in the back of the book. For example, for x = 1 and y = 2, we get 0.1562 x 0.2304 = 0.03599. 
Adding up all 15 of these and subtracting from 1 we get 0.4957. 


Before we prove the three facts, we shall show that they imply the desired result. According to (c), 
every distribution with the specified moments must take only the values 0 and 1. The mean of such a 
distribution is Pr(X = 1). This number, Pr(X = 1), uniquely determines every distribution that can 
only take the two values 0 an 1. 


(a) Suppose that Pr(|X| > 1) > 0. Then there exists « > 0 such that Pr(|X| > 1+) > 0. Then 
BK) > ae Pex] Sie), 


Since the right side of this equation goes to oo as k > oo, it cannot be the case that E(X?*) = 1/3 
for all k. This contradiction means that our assumption that Pr(|X| > 1) > 0 must be false. That 
is, Pri|X| <1) =1. 
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(b) Since X* < X? whenever |X| < 1 and X? ¢ {0,1}, it follows that E(X*) < E(X?) whenever 
Pr(|X| < 1) = 1 and Pr(X? ¢ {0,1}) > 0. Since we know that E(X*) = E(X?) and Pr(|X| < 
1) = 1, it must be that Pr(X? ¢ {0,1}) =0. That is, Pr(X? € {0,1}) =1. 
(c) From (b) we know that Pr(X € {—1,0,1}) = 1. We also know that 
E(X) Pr(X = 1) — Pr(X = -1) 
E(X*) = Pr(X =1) + Pr(X =—1). 


Since these two are equal, it follows that Pr(X = —1) =0. 


We need the maximum number of tests if and only if every first-stage and second-stage subgroup has 
at least one positive result. In that case, we would need 10 + 100 + 1000 = 1110 total tests. The 
probability that we have to run this many tests is the probability that every Yo;,, = 1, which in turn 
is the probability that every Zo; > 0. The Z2;%’s are independent binomial random variables with 
parameters 10 and 0.002, and there are 100 of them altogether. The probability that each is positive 
is 0.0198, as computed in Example 5.2.7. The probability that they are all positive is (0.0198)!0? = 
AGA x10", 


We use notation like that in Example 5.2.7 with one extra stage. For i = 1,...,5, let 2, be the 
number of people in group i who test positive. Let Yj, = 1 if 21; > 0 and Y;,, = 0 if not. Then 2; 
has the binomial distribution with parameters 200 and 0.002, while Y;,; has the Bernoulli distribution 
with parameter 1 — 0.9987 = 0.3299. Let Z2,;,, be the number of people who test positive in the kth 
subgroup of group 7 for k = 1,...,5. Let Yor, = 1 if Zoi, > 0 and Yo;, = 0 if not. Each Zo; has 
the binomial distribution with parameters 40 and 0.002, while Y2;, has the Bernoulli distribution with 
parameter 1 — 0.9984° = 0.0770. Finally, let Z3,i,k,j be the number of people who test positive in the 
jth sub-subgroup of the kth subgroup of the ith group. Let Y3; 4; = 1 if Z3 in, > 0 and Y3in,3 = 0 
otherwise. Then 23;,,; has the binomial distribution with parameters 8 and 0.002, while Y3; 4; has 
the Bernoulli distribution with parameter 1 — 0.998° = 0.0159. 


The maximum number of tests is needed if and only if there is at least one positive amongst every one of 
the 125 sub-subgroups of size 8. In that case, we need to make 1000+125+25+5 = 1155 total tests. Let 
Yi = aa Y,,;, which is the number of groups that need further attention. Let Yo = = ae You.ks 
which is the number of subgroups that need further attention. Let Y3 = Sy ae ys Y34k,j. which 
is the number of sub-subgroups that need all 8 members tested. The actual number of tests needed is 
Y =54+5Y,+5Y2+8Y3. The mean of Yj is 5 x 0.3299 = 1.6497. The mean of Y9 is 25 x 0.0770 = 1.9239. 
The mean of Y3 is 125 x 0.0159 = 1.9861. The mean of Y is then 


E(Y) =5+5 x 1.6497 + 5 x 1.9239 + 8 x 1.9861 = 38.7569. 


5.3. The Hypergeometric Distributions 


Commentary 


The hypergeometric distribution arises in finite population sampling and in some theoretical calculations. 
It actually does not figure in the remainder of this text, and this section could be omitted despite the fact 
that it is not marked with an asterisk. The section ends with a discussion of how to extend the definition of 
binomial coefficients in order to make certain formulas easier to write. This discussion is not central to the 
rest of the text. It does arise again in a theoretical discussion at the end of Sec. 5.5. 

If one is using the statistical software R, then the functions dhyper, phyper, and qhyper give the p.f., 
the c.d.f., and the quantile function of hypergeometric distributions. The syntax is that the first argument is 
the argument of the function, and the next three are A, B, and n in the notation of the text. The function 
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rhyper gives a random sample of hypergeometric random variables. The first argument is how many you 
want, and the next three are A, B, and n. All of the solutions that require the calculation of hypergeometric 
probabilites can be done using these functions. 


Solutions to Exercises 


1. Using Eq. (5.3.1) with the parameters A = 10, B = 24, and n = 11, we obtain the desired probability 


10\ (24 
10 L 3 
Pr(X = 10) = ~~+~—_. = 8.389 x 10°”. 
34 
11 
2. Let X denote the number of red balls that are obtained. Then X has the hypergeometric distribution 
with parameters A = 5, B = 10, and n = 7. The maximum value of X is min{n, A} = 5, hence, 


Cee) i . 
2 —2 74 
r(X > 3) =o wee ~ 0.4266. 


= +. : ’ 6435 


3. As in Exercise 2, let X denote the number of red balls in the sample. Then, by Eqs. (5.3.3) and (5.3.4), 


nA 7 nAB A+B-n 8 
— = XxX Se oS eee a ee 
Wag 4g Ot ae? aot 
Since X = X/n, 
P= BOOS] cd Vas aS 
( aa ~ 3 an ar ne ar Aa 


4. By Eq. (5.3.4), 
Var(X) = BWICU n(28 — n). 


The quadratic function n(28 — n) is a maximum when n = 14. 


5. By Eq. (5.3.4), 
Var(X) = —=———_ n(T — n). 


If T is an even integer, then the quadratic function n(T — n) is a maximum when n = T/2. If T is 
an odd integer, then the maximum value of n(T' — n), for n = 0,1,2,...,7, occurs at the two integers 
(T —1)/2 and (7 +1)/2. 
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6. For c =0,1,...,k, 


Pr(X; =a and X,+X.=k) Pr(X, =x and Xy=k—-2z) 
eR Se ae) Pr(X1 + Xo =f) Pr(Xi + Xo =f) 


Since Xj and X2 are independent, 
Prix, =e and Xg=k= 2) = Pry =z) Pri Xe =k — 2). 


Furthermore, it follows from a result given in Sec. 5.2 that X, + X2 will have the binomial distribution 
with parameters n; +2 and p. Therefore, 


Pr(X, =2) = la = 


ine (. Ja = 
By substituting these values into the expression given earlier, we find that for c = 0,1,...,k, 
x k-2x 
Pr(X, = 2|X, + Xp = k) = 7-44 
ny + n2 
k 


It can be seen that this conditional distribution is a hypergeometric distribution with parameters n1, na, 
and k. 


7. (a) The probability of obtaining exactly x defective items is 
0.37 0.77 
He 10-2 
7 : 
10 


Therefore, the probability of obtaining not more than one defective item is the sum of these 
probabilities for « = 0 and x = 1. 


Since 


OL 3T 
(") )=3 and (°? ) oar 


this sum is equal to 
0.77 0.77 
3T 
7p : 
10 
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(b) The probability of obtaining exactly x defectives according to the binomial distribution, is 
10 
ier O07, 
(" (0.3)"(0-7) 
The desired probability is the sum of these probabilities for « = 0 and x = 1, which is 
(0.7)*° + 10(0.3)(0.7)°. 


For a large value of T, the answers in (a) and (b) will be very close to each other, although this 
fact is not obvious from the different forms of the two answers. 


8. If we let X; denote the height of the ith person selected, for 7 = 1,...,n, then X = X,+---+ Xp. 


Furthermore, since X; is equally likely to have any one of the T values a1,...,a7, then 
1 T 
E(X;) = Fo hi =U 
i=1 
and 


— 


by 
Var(X;) = a SG =p) =o". 
i=1 


It follows that E(X) = ny. Furthermore, by Theorem 4.6.7, 
nm 
Var(X) = S > Var(X;) +2 > yy Cov(X;, Xj). 
i=1 i<j 
Because of the symmetry among the variables Xj,...,Xy, it follows that 
Var(X) = no? + n(n — 1) Cov(Xj, Xo). 


We know that Var(X) = 0 for n = T. Therefore, 


Cov(X1, X2) =— 


T-1 


It now follows that 
Var(X) = no? — nn?) 2 = no? (= — *) : 
9. By Eq. (5.3.14), 


(*) _ (3/2)4/2)(-1/2)(-3/2) _ 3 
4 


4) 128° 
10. By Eq. (5.3.14), 


(-") (—n)(—n —1)---(—-n —k 41) (-1)¥(n)(n+1)...(n+k-1) 


kj kl kl 


If we reverse the order of the factors in the numerator, we can rewrite this relation as follows: 


(3) _ CUM + k= Yin + k= 2) () _ con(Mte- ) 
7 ki! . 
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11. Write (1 + a,)"e~%"™ = exp|c, log(1 + an) — ancy]. The result is proven if we can show that 


[cn log(1 + an) — Qncpn] = 0. (8.5.1) 


lim 
N—- Oo 
Use Taylor’s theorem with remainder to write 


2 
a 
log (1 ie oh 
og( + Gn) An 2(1 + Yn)?’ 


where y,, is between 0 and ay. It follows that 


Cya2 med 


Cn log(1 + Gn) — AnCn = CnAn — 2(1 + yn)? 2(1 + yn)? 


We have assumed that c,a2 goes to 0. Since yp, is between 0 and an, and ay, goes to 0, we have 
1/[2(1 + yn)?] goes to 0. This establishes (S.5.1). 


5.4 The Poisson Distributions 


Commentary 


This section ends with a more theoretical look at the assumptions underlying the Poisson process. This 
material is designed for the more mathematically inclined students who might wish to see a derivation of the 
Poisson distribution from those assumptions. Such a derivation is outlined in Exercise 16 in this section. 

If one is using the statistical software R, then the functions dpois, ppois, and qpois give the p.f., the 
c.d.f., and the quantile function of Poisson distributions. The syntax is that the first argument is the argument 
of the function, and the second is the mean. The function rpois gives a random sample of Poisson random 
variables. The first argument is how many you want, and the second is the mean. All of the solutions that 
require the calculation of Poisson probabilites can be done using these functions instead of tables. 


Solutions to Exercises 


1. The number of oocysts X in t = 100 liters of water has the Poisson distribution with mean 0.2 x 0.1 x 
100 = 2. Using the Poisson distribution table in the back of the book, we find 


Pr(X > 2) =1—Pr(X < 1) = 1— 0.1353 — 0.2707 = 0.594. 
2. From the table of the Poisson distribution in the back of the book it is found that 


Pr(X > 3) = .0284 + .0050 + .0007 + .0001 + .0000 = .0342. 


3. Since the number of defects on each bolt has the Poisson distribution with mean 0.4, and the observa- 
tions for the five bolts are independent, the sum for the numbers of defects on five bolts will have the 
Poisson distribution with mean 5(0.4) = 2. It is found from the table of the Poisson distribution that 


Pr(X > 6) = .0120 + .0034 + .0009 + .0002 + .0000 = .0165. 


There is some rounding error in this, and 0.0166 is closer. 
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4. If f(x| A) is the p.f. of the Poisson distribution with mean A, then 


Pr(X = 0) = f(0|A) = exp(-A). 


5. Let Y denote the number of misprints on a given page. Then the probability p that a given page will 


contain more than k misprints is 
aa = ALA 
g¢=PrY Sk) => fiAj= > a he 
i=k+1 i=k+1 
Therefore, 
exp(—A).! 
l-p= ya fG@\A= = ys 
i=0 . 


Now let X denote the number of pages, among the n pages in the book, on which there are more than 
k misprints. Then for x = 0,1,...,n, 


and 


Prix > im) = 3 ("ora —p)*, 


zr=mM 


6. We shall assume that defects occur in accordance with a Poisson process. Then the number of defects 


in 1200 feet of tape will have the Poisson distribution with mean pp = 3(1.2) = 3.6. Therefore, the 
probability that there will be no defects is exp(—j) = exp(—3.6). 


7. We shall assume that customers are served in accordance with a Poisson process. Then the number of 


customers served in a two-hour period will have the Poisson distribution with mean p = 2(15) = 30. 
Therefore, the probability that more than 20 customers will be served is 


- exp(—30)(30)"_ 


Pr(X > 20) = 
av: 


xr=21 
For « = 0,1,...,k, 


Pr(Xy =a and X1+ X2=k) — Pr(Xj =2 and Xx =k-— 2) 


Px =<3 te oe eR = 
r( 1 x | 1+ XQ ) Pr(X, + X2 =k) Pr(Xy +X =k) 


Since X; and X2 are independent, 
Pr(X, = a and Xp =k-—z) =Pr(X1 =z) Pr(Xo =k—-2). 


Also, by Theorem 5.4.4 the sum X, + X9 will have the Poisson distribution with mean A, + A2. Hence, 


exp(—A1)Az 


Po Ap=—2) = a! 
k-a 


Pr(Xy + Xo = k) 7 
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It now follows that 


k! At my d2 ee k x k-x 
Pik 21h += Woes (SAE) Be) = ({ero-a 


where p = Ay/(A, + A2). It can now be seen that this conditional distribution is a binomial distribution 
with parameters k and p = »;/(Ai + 2). 


. Let N denote the total number of items produced by the machine and let X denote the number of 
defective items produced by the machine. Then, for « = 0,1,..., 


Prax =a) = S Prix =2|N =n) PrN =n). 
n=0 


Clearly, it must be true that X < N. Therefore, the terms in this summation for n < x will be 0, and 
we may write 


Prix =a) = d Pr(X =¢|N =njPrn =). 


Clearly, Pr(X = 0|N = 0) =1. Also, given that N = n > 0, the conditional distribution of X will be 
a binomial distribution with parameters n and p. Therefore, 


n! _ 


Also, since N has the Poisson distribution with mean 4, 


Hence, 
= = = n Ll n—2 &XPp(—A)A” =. ag = = i _ y\n-x“y\n 
If we let t =n — <2, then 
1 x = 1 €yt+e 
Pr(X=2) = =ptexp(—A) 7 Gp)! 
x} ta 
_ zt —— [A(L = p)] 
= (Ap) aes 7 


_ exp(=Ap)(Ap)" 
x! ; 


= = p)® exp(—A) exp(A(1 — p)) 


It can be seen that this final term is the value of the p.f. of the Poisson distribution with mean Ap. 
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10. 


11. 


12. 


13. 


14. 
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It must be true that X + Y = N. Therefore, for any nonnegative integers x and y, 


Pr(X =xandY=y) = Pr(iX=ac#andN=2+y) 
= Pr(X=2|N=a+y)Pr(N=2+y) 


iy ! exp(—A)A?* 
~ Pry ca 
= exp) SR RO 


The fact that we have factored Pr(X = x and Y = y) into the product of a function of x and a function 
of y is sufficient for us to be able to conclude that X and Y are independent. However, if we continue 
further and write 


exp(—A) = exp(—Ap) exp(—A(1 — p)) 
then we can obtain the factorization 


exp(—A)p(Ap)" _ exp(-AC. — p) [AG = py 


PRX =fand Y Sy) i = 


If f(z| A) denotes the p.f. of the Poisson distribution with mean 4, then 


f(@t1)A) Xd 


F(e)A) oe 4l 


Therefore, f(x| A) < f(a@+1] A) if and only if +1 < X. It follows that if \ is not an integer, then the 
mode of this distribution will be the largest integer x that is less than A or, equivalently, the smallest 
integer x such that x +1 > 4X. If A is an integer, then both the values \ — 1 and 4 will be modes. 


It can be assumed that the exact distribution of the number of colorblind people in the group is a 
binomial distribution with parameters n = 600 and p = 0.005. Therefore, this distribution can be 
approximated by a Poisson distribution with mean 600(0.005) = 3. It is found from the table of the 
Poisson distribution that 


Pr(X < 1) = .0498 + .1494 = .1992. 
It can be assumed that the exact number of sets of triplets in this hospital is a binomial distribution 


with parameters n = 700 and p = 0.001. Therefore, this distribution can be approximated by a Poisson 
distribution with mean 700(0.001) = 0.7. It is found from the table of the Poisson distribution that 


PrA = 1) = 0.3476. 
Let X denote the number of people who do not appear for the flight. Then everyone who does appear 
will have a seat if and only if X > 2. It can be assumed that the exact distribution of X is a binomial 
distribution with parameters n = 200 and p = 0.01. Therefore, this distribution can be approximated 
by a Poisson distribution with mean 200(0.01) = 2. It is found from the table of the Poisson distribution 
that 


Pr(X > 2) =1—Pr(X <1) =1—.1353 — .2707 = 5940. 
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The joint p.f./p.d.f. of X and \ is the Poisson p.f. with parameter \ times f(A) which equals 


exp(-A) 2 exp(—2A) = isaae (8.5.2) 


xv 
ee 
We need to compute the marginal p.f. of X at « = 1 and divide that into (S.5.2) to get the conditional 


p.d.f. of X given X = 1. The marginal p.f. of X at 2 = 1 is the integral of (S.5.2) over A when x = 1 is 
plugged in. 
2 


hil) = i. 2\ exp(—3A)dA = 5 


This makes the conditional p.d.f. of A equal to 9\ exp(—3A) for A > 0. 
(a) Let A = U2, Aj. Then 
{X=kb= "UA |= NAUVUGA Hei NA). 


The second event on the right side of this equation is {W,, = k}. Call the first event on the right 
side of this equation B. Then B Cc A. Since B and {W,, = k} are disjoint, Pr(X = k) = Pr(W,, = 
k) + Pr(B). 


(b) Since the subintervals are disjoint, the events A1,...,A, are independent. Since the subintervals 
all have the same length t/n, each A; has the same probability. It follows that 


Pr(n,4s) = [1 Pr(Ay))”. 
By assumption, Pr(A;) = o(1/n), so 

Pr(A) = 1 — Pr (ML, Af) = 1- [1 — o(1/n)]”. 
So, 

Jim, Pr(A) =1—- Jim [1 — o(1/n)]” = 1, 
according to Eq. (5.4.9). 


(c) Since the Y; are i.i.d. Bernoulli random variables with parameter p, = At/n + o(1/n), we know 
that W,, has the binomial distribution with parameters n and p,. Hence 


n n! k nk 
PW, = A= (7) — pp)"* = won = + o(1/n)| E 7 “ —oiyai| 


For fixed k, 
Mt 
e k _ k 
Jim n Ee +0(1/n)| =a (NE) 
Also, using the formula stated in the exercise, 
Mt ee 
. k -_ - 
Jim n f = o(1/n)| = exp(—At). 
It follows that 


(At)* be n! 
kl n>00 nk(n — k)! 


lim Pr(W,, = k) = exp(—At) 


n—-oco 


We can write 
n! n(n—1)---(n-—k+1) 


nk(n—k)! nnn 


For fixed k, the limit of this ratio is 1 as n > oo. 
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(d) We have established that 
Pr(X = k) = Pr(W,, = k) + Pr(B). 
Since the left side of this equation does not depend on n, we can write 
PriA=}k)= Jim, Pr(W,, = k) + Jim, Pris), 


In earlier parts of this exercise we showed that the two limits on the right are exp(—At)(At)*/k! 
and 0 respectively. So, X has the Poisson distribution with mean At. 


17. Because npAr/(Ar + Br) converges to A, n7/(Ar + Br) goes to 0. Hence, Br eventually gets larger 
than np. Once Br is larger than np + x and A? is larger than x, we have 
A B 
oo = ee eae) _ Ar!Brlnr\(Ar + Br — nr)! 
af (Ar¢Br) © al(Ar — 2) (nr — 2) (Br — np + 2)!(Ar + Br)! 
Apply Stirling’s formula to each of the factorials in the above expression except x!. A little manipulation 
gives that 


im —— ARE pn Rar + Bra n)Arterer ter 
T-00 Pr(Xp = a)al(Ap — 2) Ar-H (np — 2)" Byp —n + 2) Bret 12 Ap 4 Bp) Art Bt? 
= (8.5.3) 


Each of the following limits follows from Theorem 5.3.3: 


Ar Arp—2+4+1/2 
lim ( ) =i46" 
T-0co Ar = 2 
Br-—nr+a4+1/2 
lim goes ) e"T = e”, 
Too \Br-nrpt+2z 
= Ar+Br—n+1/2 
lim Te mr a 1, 
T-00 Ar + Br 
np—x+1/2 
lim ( = ) =e”, 
T>00 \ npr — & 
; Br np—-xL£ sk 
l —— = 
=e ee + =) a 
Inserting these limits in (S.5.3) yields 
At erne 
(i 8.5.4 
T-400 Pr(Xp = w)al(Ap + Br) pos 
Since npAr/(Ar + Br) converges to A, we have 
jini 
lim ——+~+~—_. = )’. §.5.5 
T-00 (Ap + Br)® ai 
Together (S.5.4) and (S.5.5) imply that 
BRA 
Fe cate 


T-00 Pr(Xp = 2) 


The numerator of this last expression is Pr(Y = x), which completes the proof. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 5.5. The Negative Binomial Distributions 155 


18. First write 


npAr np Ar np Az, Ar Ar 
Se eg (8.5.6) 
Br Ar+Br Br(Ar+ Br) Br Ar+ Br 


For the “if” part, assume that np Ar/Br converges to X. Since np goes to oo, then Ar/Br goes to 0, 
which implies that Ar/(Ar + Br) (which is smaller) goes to 0. In the final expression in (8.5.6), 
the product of the first two factors goes to A by assumption, and the third factor goes to 0, so 
npAr/(Ar + Br) converges to the same thing as nrAr/Br, namely X. For the “only if” part, as- 
sume that np Ar/(Ar + Br) converges to X. It follows that Ar/(Ar + Br) = 1/(1+ Br/Ar) goes to 
0, hence A7/Br goes to 0. In the last expression in (S.5.6), the product of the first and third factors 
goes to A by assumption, and the second factor goes to 0, hence nr Ar/Br converges to the same thing 
as npAr/(Ar + Br), namely X. 


5.5 The Negative Binomial Distributions 


Commentary 


This section ends with a discussion of how to extend the definition of negative binomial distribution by 
making use of the extended definition of binomial coefficients from Sec. 5.3. 

If one is using the statistical software R, then the functions dnbinom, pnbinom, and qnbinom give the 
p-f., the c.d.f., and the quantile function of negative binomial distributions. The syntax is that the first 
argument is the argument of the function, and the next two are r and p in the notation of the text. The 
function rnbinom gives a random sample of binomial random variables. The first argument is how many you 
want, and the next two are r and p. All of the solutions that require the calculation of negative binomial 
probabilites can be done using these functions. There are also functions dgeom, pgeom, qgeom, and rgeom 


bb? 


that compute similar features of geometric distributions. Just remove the “r” argument. 


Solutions to Exercises 


1. (a) Two particular days in a row have independent draws, and each draw has probability 0.01 of 
producing triples. So, the probability that two particular days in a row will both have triples is 
10-4, 
(b) Since a particular day and the next day are independent, the conditional probability of triples on 
the next day is 0.01 conditional on whatever happens on the first day. 


2. (a) The number of tails will have the negative binomial distribution with parameters r = 5 and 
p = 1/30. By Eq. (5.5.7), 


B(x) = LEP) ~ 509) = 145. 
(b) By Eq. (5.5.7), Var(X) = a = 4350. 


3. (a) Let X denote the number of tails that are obtained before five heads are obtained, and let Y denote 
the total number of tosses that are required. Then Y = X + 5. Therefore, E(Y) = E(X) +5. It 
follows from Exercise 2(a) that E(Y) = 150. 


(b) Suppose Y = X +5, then Var(Y) = Var(X). Therefore, it follows from Exercise 2(b) that 
Var(Y) = 4350. 
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4. (a) The number of failures X 4 obtained by player A before he obtains r successes will have the negative 
binomial distribution with parameters r and p. The total number of throws required by player A 
will be Y4 = X4 +1. Therefore, 

l-—p r 


+r=-. 
Pp 


The number of failures Xg obtained by player B before he obtains mr successes will have the 
negative binomial distribution with parameters mr and mp. The total number of throws required 
by player B will be Yg = Xgp+mr. Therefore, 


E(Ya) = E(Xa) -r=T 


E(Yg) = E(Xp)+mr = (mr) =) +mr = 2 
(b) 
r{il— r 
Var(Y4) = Var(X4) = eee = pu —p) and 
Var(Yg) = Var(Xp) = IE re) _ © (2-9). 


Therefore, Var(Yg) < Var(Ya). 


5. By Eq. (5.5.6), the m.g.f. of X; is 


w(t) = (ea) for t< log (=) . 


Therefore, the m.g.f. of X; +---+ Xz is 


i= [Two = Ceresnr7 7 ee fee eiee (—) . 


Since w(t) is the m.g.f. of the negative binomial distribution with parameters rj +---+ 7, and p, that 
must be the distribution of X; +---+ Xx. 


6. For « =0,1,2,..., 


Pr(X = x) = p(1— p)’. 


If we let x = 27, then as 7 runs through all the integers 0,1,2,..., the value of 27 will run through all 
the even integers 0,2,4,.... Therefore, 
[o-e) loc) 1 
; . eo i Qi _ 
Pr(X is an even integer) = 2 P(t —p)"= pd (0 — p|*)' = Pl 
(oe) (oe) 
t. Pr X > k)y= S> pl —p)* =p(1—p)* Sol — p)**. If we let i= a —k, then 
=j rj 
Pr(X > k) = p(1—p)* 01. — p)’ = pt. — p)* = (1—p)* 


I-f-# ~ 


w=j 
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PrixX=k+tand X 2k) PriX=k+1t) 

Pr(X > k) ~ Pr(X >k) 
By Eq. (5.5.3), Pr(X = k +t) = p(1 —p)***. By Exercise 7, Pr(X>k) = (1—p)*. Therefore, 
Pr(X =k+t|X >k)=p(1—p)' =Pr(X =). 


8. Pr(X =k+t|X>k)= 


9. Since the components are connected in series, the system will function properly only as long as every 
component functions properly. Let X; denote the number of periods that component i functions prop- 
erly, fori = 1,...,n, and let X denote the number of periods that system functions properly. Then for 
any nonnegative integer 7, 

Prx 2 2) = Pr Ay 2 Bys555 ky S 2) = PHA SB) ie Pr Ky S 2), 
because the n components are independent. By Exercise 7, 


Pr(X; > 2) =(1—p)” = (1-7). 


Therefore, Pr(X > x) = [Jj_,(1 — p;)*. It follows that 


P(X =2) = P(X >2)—Pr(X >e+1)=]]d-»)*-T]a-p? 


(.- [ya] (Ta -n9) 


i=l i=1 


It can be seen that this is the p.f. of the geometric distribution with p = 1 — [][#_,(1—p,). 


10. By the assumptions of the problem we have p = 1— A/r and r > oo. To simplify some of the formulas, 
let g=1-—pso that g = A/r. It now follows from Eq. (5.5.1) that 


flelnp) = a) fa 
af 


[q(r + x —1)][a(r7 +2 — 2)]. i *\' 


at ce EC OO Ea 


x 


Hence, f(x|r,p) > ~ exp(—A) = f(z|A). 


11. According to Exercise 10 in Sec. 5.3, 


(z}-eor("2) 
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This makes 


which is the proper form of the negative binomial p.f. for x = 0,1,2,.... 


12. The joint p.f./p.d.f. of X and P is f(p) times the geometric p.f. with parameter p, that is 
p(1 — p)"10(1 — p)® = p(1 — p)**®, for x =0,1,... and0O<p<1. (S.5.7) 


The marginal p.f. of X at x = 12 is the integral of (S.5.7) over p with x = 12 substituted: 


[vp = [0rd = - =e 
go ee fe PP 99 23 506" 


The conditional p.d.f. of P given X = 12 is (8.5.7) divided by this last value 
g(p\12) = 506p(1 — p)?!, forO<p<1. 


13. (a) The memoryless property says that, for all k,t > 0, 


Pr(X =k+t) 


1-FG=1 = Prix =k 


(The above version switches the use of k and t from Theorem 5.5.5.) If we sum both sides of this 
over k=h,h+1,..., we get 


i=Feery 


fore ee 


(b) &(t +h) = log{1 — F(t +h —1)]. From part (a), we have 
1—-F¢+hA-1)=[1- F-1)|[1- F(h- 1), 


Hence 
€(t+h) = log({1 — F(t — 1)] + log/1 — F(h — 1)] = €(¢) + 2h). 


(c) We prove this by induction. Clearly (1) = 1 x €(1), so the result holds for t = 1. Assume that the 
result holds for all t < to. Then ¢(tp + 1) = &(to) + 2(1) by part (b). By the induction hypothesis, 
(to) = to&(1), hence (to + 1) = (to + 1)£(1), and the result holds for t = tp + 1. 


Since £(1) = log[1 — F'(0)], we have £(1) < 0. Let p = 1 — exp[@(1)], which between 0 and 1. For 
every integer x > 1, we have, from part (c) and the definition of @, that 


& 


F(a — 1) =1-exp[é(z)] = 1 — exp[x@(1)] =1-(1—p)’. 
Setting t= a —1 for « > 1, we get 
Fj =1= (=p), for t=0, La (S.5.8) 


It is easy to verify that (S.5.8) is the c.d.f. of the geometric distribution with parameter p. 
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5.6 The Normal Distributions 


Commentary 


In addition to introducing the family of normal distributions, we also describe the family of lognormal dis- 
tributions. These distributions arise frequently in engineering and financial applications. (Examples 5.6.9 
and 5.6.10 give two such cases.) It is true that lognormal distributions are nothing more than simple trans- 
formations of normal distributions. However, at this point in their study, many students will not yet be 
sufficiently comfortable with transformations to be able to derive these distributions and their properties 
without a little help. 

If one is using the statistical software R, then the functions dnorm, pnorm, and qnorm give the p.d_f., 
the c.d.f., and the quantile function of normal distributions. The syntax is that the first argument is the 
argument of the function, and the next two are the mean and standard deviation. The function rnorm gives 
a random sample of normal random variables. The first argument is how many you want, and the next two 
are the mean and standard deviation. All of the solutions that require the calculation of normal probabilites 
and quantiles can be done using these functions instead of tables. There are also functions dlnorm, plnorn, 
qinorm, and rlnorm that compute similar features for lognormal distributions. 


Solutions to Exercises 


1. By the symmetry of the standard normal distribution around 0, the 0.5 quantile must be 0. The 0.75 
quantile is found by locating 0.75 in the ®() column of the standard normal table and interpolating in 
the « column. We find ©(0.67) = 0.7486 and ®(0.68) = 0.7517. Interpolating gives the 0.75 quantile as 
0.6745. By symmetry, the 0.25 quantile is —0.6745. Similarly we find the 0.9 quantile by interpolation 
using ®(1.28) = 0.8997 and ®(1.29) = 0.9015. The 0.9 quantile is then 1.282 and the 0.1 quantile is 
—1.282. 


2. Let Z = (X —1)/2. Then Z has the standard normal distribution. 


(a) Pr(X <3) =Pr(Z <1) = @(1) = 0.8413 
(b) Pr(X > 1.5) = Pr(Z > 0.25) = 1 — (0.25) = 0.4013. 
(c) Pr(X = 1) =0, because X has a continuous distribution. 
(d) Pr(2< X <5) =Pr(0.5 < Z < 2) = &(2) — 80.5) = 0.2858. 
(e) Pr(X > 0) = Pr(Z > —0.5) = Pr(Z < 0.5) = 8(0.5) = 0.6915. 
(f) Pr(-1 < X < 0.5) = Pr(—1 < Z < —0.25) = Pr(0.25 < Z < 1) = ®(1) — 8(0.25) = 0.2426. 
(g) 
Pr(|X| <2) = Pr(-—2< X <2) =Pr(-1.5 < Z < 0.5) 
= Pr4 < 05) —Pr(Z < =1.5)=Pr7 < 0.5) 
—Pr(Z > 1.5) = &(0.5) — [1 — (1.5)] = 0.6247. 
(h) 
Pr(l < -2X+3<8) = Pr(—2<—2X <5) =Pr(-2.5 < X <1) 
= Pr(-1.75 < Z <0) =Pr(0 < Z < 175) 
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. If X denotes the temperature in degrees Fahrenheit and Y denotes the temperature in degrees Celsius, 


then Y = 5(X — 32)/9. Since Y is a linear function of X, then Y will also have a normal distribution. 
Also, 


E(Y) = (68 — 32) =20 and Var(¥) = (2) (16) = a 


. The q quantile of the temperature in degrees Fahrenheit is 68 + 46~!(q). Using Exercise 1, we have 


6~1(0.75) = 0.6745 and ®~!(0.25) = —0.6745. So, the 0.25 quantile is 65.302, and the 0.75 quantile is 
70.698. 


. Let A; be the event that chip i lasts at most 290 hours. We want the probability of U3_,A%, whose 


4) 


probability is 
3 
1—Pr G2r0) =1- ii Pr( A; }: 
i=1 


Since the lifetime of each chip has the normal distribution with mean 300 and standard deviation 10, 
each A; has probability 


®([290 — 300]/10) = ®(—1) = 1 — 0.8413 = 0.1587. 


So the probability we want is 1 — 0.1587 = 0.9960. 


. By comparing the given m.g.f. with the m.g.f. of a normal distribution presented in Eq. (5.6.5), we can 


see that, for the given m.g-f., 4 = 0 and o? = 2. 


. If X is a measurement having the specified normal distribution, and if Z = (X — 120)/2, then Z will 


have the standard normal distribution. Therefore, the probability that a particular measurement will 
lie in the given interval is 


p =Pr(116 <_X < 118) =Pr(-2 < Z < -1) = Pr(1 < Z < 2) = ®(2) — ®(1) = 0.1360. 


The probability that all three measurements will lie in the interval is p®. 


. Except for a constant factor, this integrand has the form of the p.d.f. of a normal distribution for which 


y= 0 and o? = 1/6. Therefore, if we multiply the integrand by 


1 3\ 1/2 
(Qr)'V2q (=) , 
we obtain the p.d.f. of a normal distribution and we know that the integral of this p.d.f. over the entire 
real line must be equal to 1. Therefore, 


fore) 1/2 
/ exp(—327)dx = (=) : 


Finally, since the integrand is symmetric with respect to x = 0, the integral over the positive half of 
the real line must be equal to the integral over the negative half of the real line. Hence, 


ine) 1 1/2 
: exp(—327)dx = ; (=) : 
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The total length of the rod is X = A+ B+C-—4. Since X is a linear combination of A, B, and C, it 
will also have the normal distribution with 


E(X) =20+14+26—-4=56 


and Var(X) = 0.04 + 0.01 + 0.04 = 0.09. If we let Z = (X — 56)/0.3, then Z will have the standard 
normal distribution. Hence, 


Pr(55.7 <_X < 56.3) =Pr(—1 < Z < 1) = 2®(1) — 1 = 0.6827. 


We know that E(X,) = p and Var(Xn) = o7/n = 4/25. Hence, if we let Z = (Xn — p)/(2/5) = 
(5/2)(X, —), then Z will have the standard normal distribution. Hence, 


Pr(|Xn—pw| <1) =Pr(|Z| < 2.5) = 26(2.5) — 1 = 0.9876. 
If we let Z = /n(X» — )/2, then Z will have the standard normal distribution. Therefore, 
Pr(| Xn —p| < 0.1) = Pr(|Z| < 0.05./n) = 20(0.05/n) — 1. 


This value will be at least 0.9 if 26(0.05,/n) — 1 > 0.9 or ©(0.05,/n) > 0.95. It is found from a table 
of the values of ® that we must therefore have 0.05\/n > 1.645. The smallest integer n which satisfies 
this inequality is n = 1083. 


(a) The general shape is as shown in Fig. S.5.1. 


ie 


A 0 1 x 


Figure $.5.1: Figure for Exercise 12 of Sec. 5.6. 


(b) The sketch remains the same with the scale changed on the x-axis so that the points —1 and 
0 become —5 and —2, respectively. It turns out that the point x = 1 remains fixed in this 
transformation. 


Let X denote the diameter of the bolt and let Y denote the diameter of the nut. The Y — X will have 
the normal distribution for which 


E(Y — X) =2.02-2=0.02 
and 
Var(Y — X) = 0.0016 + 0.0009 = 0.0025. 
If we let Z = (Y — X — 0.02)/0.05, then Z will have the standard normal distribution. Therefore, 


Pr(0 < Y — X < 0.05) = Pr(—0.4 < Z < 0.6) = 6(0.6) — [1 — ®(0.4)] = 0.3812. 
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Let X denote the average of the two scores from university A and let Y denote the average of the three 
scores from university B. Then X has the normal distribution for which 


= _ 100 
E(X) =625 and Var(X)= an 50. 


Also, Y has the normal distribution for which 


= — 150 
E(Y)=600 and Var(Y) = — =50. 
3 


Therefore X — Y has the normal distribution for which 
E(X —Y) =625-—600=25 and Var(X —Y)=50+50= 100. 

It follows that if we let Z = (X —Y —25)/10, then Z will have the standard normal distribution. Hence, 
Pr(X —Y > 0) =Pr(Z > —2.5) = Pr(Z < 2.5) = ®(2.5) = 0.9938. 


Let fi(x) denote the p.d.f. of X if the person has glaucoma and let f2(a) denote the p.d.f. of X if the 
person does not have glaucoma. Furthermore, let A; denote the event that the person has glaucoma 
and let Ay = AY denote the event that the person does not have glaucoma. Then 


Pr(Aj;) = 0.1, Pr(Ag) = 0.9, 


ig = PauE exp {=a ~ 25)°} for —c<2<0, 
fo(z) = are ee{ ge 20" for —co<2<0o. 
Pte =e ray) 


Pr(A1) fi(x) + Pr(Ag) fo(x) 
(b) The value found in part (a) will be greater than 1/2 if and only if 
Pr(A1) fi(x) > Pr(A2) fo(@). 
All of the following inequalities are equivalent to this one: 
(i) exp{—(a — 25)? /2} > 9exp{—(x — 20)?/2} 
(ii) —(a — 25)?/2 > log9 — (x — 20)?/2 
(iii) (x — 20)? — (a — 25)? > 2log9 
(iv) 10% — 225 > 2log9 
(v) & > 22.5 + log(9)/5. 


The given joint p.d.f. is the joint p.d.f. of two random variables that are independent and each of which 
has the standard normal distribution. Therefore, X + Y has the normal distribution for which 


E(X+Y)=04+0=0 and Var(X+Y)=14+1=2. 
If we let Z = (X + Y)/\/2, then Z will have the standard normal distribution. Hence, 


Pr(-V/2< X+Y <2V2) = Pr(-1<Z <2) 
Pr(Z < 2) — Pr(Z < —1) 
®(2) — [1 — ®(1)] = 0.8186. 
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If Y = log X, then the p.d.f. of Y is 


1 1 
at) = Boag xP {gay — for —-c<y<o. 


d 1 
Since = = it now follows that the p.d.f. of X, for x > 0, is f(x) = g(log x) /z. 
Let U = X/Y and, as a convenient device, let V = Y. If we exclude the possibility that Y = 0, the 


transformation from X and Y to U and V is then one-to-one. (Since Pr(Y = 0) = 0, we can exclude 
this possibility.) The inverse transformation is 


X=UV and Y=V. 


Hence, the Jacobian is 


Ox Ox 
7 du Ov | _ a 
J = det ay dy = det E ‘| =v. 
Ou Ou 


Since X and Y are independent and each has the standard normal distribution, their joint p.d.f. f(x,y) 
is as given in Exercise 16. Therefore, the joint p.d.f. g(u,v) of U and V will be 


To find the marginal p.d.f. gi(u) of U, we can now integrate g(u,v) over all values of v. (The fact that 
the single point v = 0 was excluded does not affect the value of the integral over the entire real line.) 
We have 


elu 1 
gi(u) = / Ph exp {Su +0" ae 
cae 1 
= i) = exp {—5(u? + 1)0?} ude 
T 2 
: f 
= aad) or —c0o<u<oo 


It can now be seen that gi(u) is the p.d.f. of a Cauchy distribution as defined in Eq. (4.1.7). 


The conditional p.d.f. of X given p is 
ft 2 
g(z|H) = (amy xP — p)°/2), 


while the marginal p.d.f. of ys is fo(w) = 0.1 for 5 < w < 15. We need the marginal p.d.f. of X, which 
we get by integrating yz out of the joint p.d_f. 


i re. ; 
(any xP (a — ps)" /2), for 5 < a < 15. 


gi(a|e) fo(u) = 
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The integral is 
fle) =f oars ex0(—(0 = 1)*/2)dh = 0.1[(15 ~ 2) ~ (6 ~ 2) 


With x = 8, the value is 0.1[6(7) — @(—3)] = 0.0999. This makes the conditional p.d.f. of js given 
X=8 


1.0013 ; 
luis) = Goze e(-(8 ~ H)?/2), for << 16. 


This probability is the probability that log(X) < log(6.05) = 1.80, which equals 
&((1.80 — 3]/1.44/?) = &(—1) = 0.1587. 


Note that log(XY) = log(X) + log(Y). Since X and Y are independent with normal distributions, 
we have that log(XY) has the normal distribution with the sum of the means (4.6) and sum of the 
variances (10.5). This means that XY has the lognormal distribution with parameters 4.6 and 10.5. 


Since log(1/X) = —log(X), we know that —log(X) has the normal distribution with mean —y and 
variance o?. This mean that 1/X has the lognormal distribution with parameters —j and o?. 


Since log(3.X'/?) = log(3) + log(X)/2, we know that log(3.X‘/?) has the normal distribution with mean 
log(3) + 4.1/2 = 3.149 and variance 8/4 = 2. This means that 3X!/? has the lognormal distribution 
with parameters 3.149 and 2. 


First expand the left side of the equation to get 
nm n 
a;(x — bj)? + cx = ca + lex — 2ajbjx + 07]. (S.5.9) 


i=1 i=l 


Now collect all the squared and linear terms in x. The coefficient of x? is 7"_, aj. The coefficient of x 
is c— 27", a;b;. The constant term is )77_, ajb?. This makes (S.5.9) equal to 


n 
oy a + 2 
i=l 


n 
c—2 > ab; 
j=l 


+S > aib;. (S.5.10) 
i=l 


Next, expand each term on the right side of the original equation to produce 


n n 2 n 
. | S- aid —c/2 (>: a) —cy> ajb; +¢/4 
(>: «) ao? — Qn t=! 4 AI _ i=l 


a Sai ya: 
i=l i=1 
n 2 n 
a (>: a) cy aid; 7/4 
+39 a:b? - “= +S 
= a »& 
i=1 i=1 


Combining like terms in this expression produces the same terms that are in (8.5.10). 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 5.7. The Gamma Distributions 165 


25. Divide the time interval of u years into n intervals of length u/n each. At the end of n such intervals, 
the principal gets multiplied by (1+ ru/n)”. The limit of this as n > 00 is exp(ru). 


26. The integral that defines the mean is 


oo r 2 
E(X) = a (amie %P -5] dx. 


The integrand is a function f with the property that f(—x) = —f(ax). Since the range of integration is 
symmetric around 0, the integral is 0. The integral that defines the variance is then 


cop ae 
Var(X) = E(X*) = i (amie &%P -5] ag: 


In this integral, let uw = a and 


It is easy to see that du = dx and 
_ 1 x 
v= ~ Qmiz exp = . 
Integration by parts yields 


x i 
Var(X) = -G@nirz exp -5] 


+ {ome Bay 
(mua xP 5 | a: 


The term on the right above equals 0 at both oo and —oo. The remaining integral is 1 because it is the 
integral of the standard normal p.d.f. So Var(X) = 1. 


L=—CoO 


5.7 The Gamma Distributions 


Commentary 


Gamma distributions are used in the derivation of the chi-square distribution in Sec. 8.2 and as conjugate 
prior distributions for various parameters. The gamma function arises in several integrals later in the text 
and is interesting in its own right as a generalization of the factorial function to noninteger arguments. 

If one is using the statistical software R, then the function gamma computes the gamma function, and 
lgamma computes the logarithm of the gamma function. They take only one argument. The functions dgamma, 
pgamma, and qgamma give the p.d-f., the c.d-f., and the quantile function of gamma distributions. The syntax 
is that the first argument is the argument of the function, and the next two are a and ( in the notation 
of the text. The function rgamma gives a random sample of gamma random variables. The first argument 
is how many you want, and the next two are a and £. All of the solutions that require the calculation of 
gamma probabilites and quantiles can be done using these functions. There are also functions dexp, pexp, 
qexp, and rexp that compute similar features for exponential distributions. Just remove the “a” parameter. 


Solutions to Exercises 


1. Let f(z) denote the p.d.f. of X and let Y =cX. Then X = Y/c. Since dx = dy/c, then for x > 0, 


OL a-1 c)@ ; 
oly) ==9 (2) <2 (4) exp(—a(y/e)) = Bye exp(-(G/oy). 
CMG el(a) \e I'(a) 
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2. The c.d.f. of the exponential distribution with parameter {3 is F(x) = 1 — exp(—$z) for x > 0. The 
inverse of this is the quantile function F~!(p) = —log(1 — p)/8. 


3. The three p.d.f.’s are in Fig. $.5.2. 


f(x) 


(a) 


f(x) f(x) 


(b) (c) 


Figure $.5.2: Figure for Exercise 3 of Sec. 5.7. 


4, 
i(2) = aot exp(—(zx) for g > 0. 
fla) = a" —1- Bx)x°-* exp(—fz) for 2 > 0: 


If a < 1, then f’(x) < 0 for « > 0. Therefore, the maximum value of f(x) occurs at « = 0. If a > 1, 
then f’(x) = 0 for « = (a — 1)/8 and it can be verified that f(x) is actually a maximum at this value 
of z. 


5. All three p.d.f.’s are in Fig. S.5.3. 


6. Each X; has the gamma distribution with parameters 1 and (. Therefore, by Theorem 5.7.7, the sum 
n nm 
bees has the gamma distribution with parameters n and 8. Finally, by Exercise 1, Xp, = bee /n 


i=1 i=1 
has the gamma distribution with parameters n and nG. 


7. Let A; = {X; >t} for i= 1,2,3. The event that at least one X; is greater than t is UJ$_, A;. We could 
use the formula in Theorem 1.10.1, or we could use that Pr(U?_, 4;) = 1 — Pr((#_, A$). The latter is 
easier because the X; are mutually independent and identically distributed. 


3 
Pr (a as) =PrAr=[laep(—on)/: 


i=1 
So, the probability we want is 1 — [1 — exp(—t)]°. 
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Figure $.5.3: Figure for Exercise 5 of Sec. 5.7. 


. For any number y > 0, 


Pry > y) = Pr Xy > Yprany Xp > y) = Pr xy > Haak hi Ay > y) 
= exp(—y)...exp(—6,y) = exp(—(81+~-: + Be)y), 


which is the probability that an exponential random variable with parameter 6, + ---+ G% is greater 
than y. Hence, Y has that exponential distribution. 


. Let Y denote the length of life of the system. Then by Exercise 8, Y has the exponential distribution 


with parameter 0.001 + 0.003 + 0.006 = 0.01. Therefore, 
1 
Pr(Y > 100) = exp(—100 (0.01)) = -. 
e 


Since the mean of the exponential distribution is 4, the parameter is 6 = 1/y. Therefore, the distri- 
bution of the time until the system fails is an exponential distribution with parameter n3 = n/p. The 
mean of this distribution is 1/(n8) = /n and the variance is 1/(n8)? = (u/n)?. 


The length of time Y; until one component fails has the exponential distribution with parameter nf. 
Therefore, E(Y,) = 1/(n8). The additional length of time Y2 until a second component fails has 
the exponential distribution with parameter (n — 1)6. Therefore, E(Y2) = 1/[(n — 1)6]. Similarly, 
E(Y3) = 1/|(n—2)6]. The total time until three components fail is Y; + Yo+ Y3 and L(Y, + Y2+Y3) = 
1 1 1 1 
n me n—-1 ui n—-2 Bo 
The length of time until the system fails will be Y; + Y2, where these variables were defined in Exer- 
1 


1 1 1 
cise 11. Therefore, E(Y; + Y2) = nB + mB =e + —) pt. Also, the variables Y; and Y2 are 
— nN — 


independent, because the distribution of Yj is always the same exponential distribution regardless of 
the value of Y;. Therefore, 


Var(Y, + Y2) = Var(Y1) + Var(Y2) = a or aoe = E a ani ye. 
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The time Y; until one of the students completes the examination has the exponential distribution with 
parameter 53 = 5/80 = 1/16. Therefore, 


Pr(¥; < 40) = 1 — exp(—40/16) = 1 — exp(—5/2) = 0.9179. 


The time Y> after one students completes the examination until a second student completes it has the 
exponential distribution with parameter 46 = 4/80 = 1/20. Therefore, 


Pr(Yo < 35) = 1 — exp(—35/20) = 1 — exp(—7/4) = 0.8262. 


No matter when the first student completes the examination, the second student to complete the 
examination will do so at least 10 minutes later than the first student if Yo > 10. Similarly, the third 
student to complete the examination will do so at least 10 minutes later than the second student if 
Y3 > 10. Furthermore, the variables Y,, Y2, Y3, Y4, and Ys are independent. Therefore, the probability 
that no two students will complete the examination within 10 minutes of each other is 


Pr(Yo >10,...,¥5>10) = Pr(¥>10)...Pr(¥s > 10) 
= exp(—(10)48) exp(—(10)38) exp(—(10)26) exp(—108) 
= exp(—40/80) exp(—30/80) exp(—20/80) exp(—10/80) 
= exp(—5/4) = 0.2865. 


If Y = log(X/zo), then X = zoexp(Y). Also, dx = xp exp(y)dy and x > zo if and only if y > 0. 
Therefore, for y > 0, 


gy) = F(r0 exp(y)|20, &) x0 exp(y) = aey” 


7 2 
E(x-0)") = | (= 0 erg 7 |S ae 


2 me x — ps)? 
omits J ce = nexp | “Ee = hae 


Let y = (a — )?. Then dx = dy/(2y!/?) and the above integral can be rewritten as 


(27)!/20 Jo 20? J Qy1/2 (27)!/20 Jo 2a" 


The integrand in this integral is the p.d.f. of a gamma distribution with parameters a = n+ 1/2 and 
B = 1/(207), except for the constant factor 


oe Z 


T(a)  (202)"+1/2P(n + 1/2) 


Since the integral of the p.d.f. of the gamma distribution must be equal to 1, it follows that 
[yr ex {sh ay = (202)4?r(n + 1/2). 
0 Qo? 
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1 1 3 1 
From Eqs. (5.7.6) and (5.7.9), T (n + 5) = (n — 5) (n — 5) ve (5) x'/?_ Therefore, 


1 1 3 
_ \2n ae 2\n+1/2 = tote, Vees 
BUX —W)"] = Goro (n- 5) (n—5) 
— 9n (n-5) (n-5) = (5) gon 
2 2 2 
= (n—1)Qn—3)... (He. 
18. For the exponential distribution with parameter {, 
f(x) = Bexp(—Bz) 
and 
1— F(a) = Pr(X > x) = exp(—fz). 
Therefore, h(x) = 8 for x > 0. 


1 
19. Let Y = X°. Then X = Y!/* and dx = gyn dy. Therefore, for y > 0, 
1/b 1 apy _ 1 b 
gy) = fy la, bey = = exp(-y/a’). 
20. If X has the Weibull distribution with parameters a and b, then the c.d.f. of X is 


P(x) = [th exp(—(t/a))at = [- exp(-(t/a) If = 1 — exp(-(¢/a)) 


Therefore, 
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If 6 > 1, then A(z) is an increasing function of x for x > 0, and if 6 < 1, then h(x) is an decreasing 


function of x for x > 0. 


21. (a) The mean of 1/X is 


os a is @ T(a-1 
[ “ae exp(—6x)dx = a gt? exp(—Ba)dx = B T(a—1) _ B 


(b) The mean of 1/X? is 


DP ae = Bf yen _ _B* Tla—2) 
[ era lexp(—Bx)dxr = ray I, a? exp(—Bx)dr = Ta) ge? 
2 B 
~~ (a—1)(a — 2) 


This makes the variance of 1/X equal to 


pb? BY... a 
la=Die—o) — (5) ~ (a —1)?(a — 2)’ 
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22. The conditional p.d.f. of A given X = x can be obtained from Bayes’ theorem for random variables 
(Theorem 3.6.4). We know 


iF x 
gi(z|A) = exp() A tor = Ue 
aa 
a = a = : 
fo(A) ray exp(—AQ), for A > 0 
The marginal p.f. of X is 
fi (x) = ae [ jate-1 exp(—A(B si t})d\ 
. lV (a) Jo 
ep (a +a) 


mP(ay(3 + He 
So, the conditional p.d.f. of A given X = x is 


ROP paces 


g2(Alx) T(a+2) 


exp(—A[6+4+¢]), for A > 0, 


which is easily recognized as the p.d.f. of a gamma distribution with parameters a +a and 6 +t. 
23. The memoryless property means that Pr(X >t+h|X >t) = Pr(X > h). 


(a) In terms of the c.d.f. the memoryless property means 
1—F(t+h) 
1 — F(t) 
(b) From (a) we obtain [1 — F'(A)][1 — F(#)] = [1 — F(t+hA)]. Taking logarithms of both sides yields 
l(h) + &(t) = (t +h). 
(c) Apply the result in part (b) with h and t both replaced by t/m. We obtain €(2t/m) = 20(t/m). 


Repeat with t replaced by 2t/m and h =t/m. The result is 0(3t/m) = 3¢(t/m). After k — 1 such 
applications, we obtain 


e(kt/m) = ke(t/m). (3.5.11) 
In particular, when k = m, we get ¢(t) = mé(t/m) or C(t/m) = &(t)/m. Substituting this into 
(S.5.11) we obtain 0(kt/m) = (k/m)&£(t). 

(d) Let c > 0 and let c,c2,... be a sequence of rational numbers that converges to c. Since @ is a 
continuous function, ¢(c,t) > €(ct). But €(c,t) = c,é(t) by part (c) since c, is rational. It follows 
that c,l(t) > &(ct). But, we know that c,0(t) > cé(t). So, cé(t) = &(ct). 

(e) Apply part (d) with c = 1/t to obtain ¢(t)/t = @(1), a constant. 

(f) Let 6 = £(1). According to part (e), €(t) = St for all t > 0. Then log[1 — F(x)| = Gx for x > 0. 
Solving for F(x) gives F(x) = 1 — exp(—(zx), which is the c.d.f. the exponential distribution with 
parameter 2 = (1). 


=1-—F(h). 


24. Let w be the m.g.f. of W,,. The mean of S, is 
E(Sy) = SoE(exp(uu + Wy)) = So exp(wu)y(1). 
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(a) Since W,, has the gamma distribution with parameters au and 6 > 1, the m.g.f. is W(t) = (6/[6 — 
t])°“. This makes the mean 


E(Sy) = So exp(uu) (s4) 


So exp(—ru)E(S,,) = So if and only if 


exp((w— nu) (FEE) =1. 


Solving this equation for jz yields 


p=r-alog (545). 


(b) Once again, we use the function 


_ ||) #—g. Tae Gy, 
me) ={ § if x <q. 
The value of the option at time u is h(S,,). Notice that S,, > q if and only if W,, > log(q/So)—pu = 
c, as defined in the exercise. Then the present value of the option is 


QU 


exp(—ru)E|h(S,)| = exp(—ru) i [So exp(uu + w) — al F (au) we! exp(—Bw)dw 


sae peas 


—qexp(— Fea we! exp(—Bw)dw 


— a 
- oar we! exp(—(6 — 1)w)dw — qexp(—ru) R(c8) 
- a in — qexp(—ru)R(cB). 
(c) We plug the values u = 1, ¢g = So, r = 0.06, a = 1, and 6 = 10 into the previous formula to get 


log(10/9) — 0.06 = 0.0454 
So | R(0.0454 x 9) — e~° R(0.0454 x 10)] = 0.066550. 


Cc 


5.8 The Beta Distributions 


Commentary 


Beta distributions arise as conjugate priors for the parameters of the Bernoulli, binomial, geometric, and 
negative binomial distributions. They also appear in several exercises later in the text, either because of 
their relationship to the t and F' distributions (Exercise 1 in Sec. 8.4 and Exercise 6 in Sec. 9.7) or as 
examples of numerical calculation of M.L.E.’s (Exercise 10 in Sec. 7.6) or calculation of sufficient statistics 
(Exercises 24(h), 24(i) in Sec. 7.3, Exercise 7 in Sec. 7.7, and Exercises 2 and 7(c) in Sec. 7.8). The derivation 
of the p.d.f. of the beta distribution relies on material from Sec. 3.9 (particularly Jacobians) which the 
instructor might have skipped earlier in the course. 

If one is using the statistical software R, then the function beta computes the beta function, and lbeta 
computes the logarithm of the beta function. They take only the two necessary arguments. The functions 
dbeta, pbeta, and qbeta give the p.d-f., the c.d.f., and the quantile function of beta distributions. The syntax 
is that the first argument is the argument of the function, and the next two are a and £ in the notation of the 
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text. The function rbeta gives a random sample of beta random variables. The first argument is how many 
you want, and the next two are a and 3. All of the solutions that require the calculation of beta probabilites 
and quantiles can be done using these functions. 


Solutions to Exercises 


1. The c.d.f. of the beta distribution with parameters a > 0 and 6 = 1 is 


0 fora <0, 
Fay=% 2 ford<2< 1, 
1 forx>1. 
Setting this equal to p and solving for x yields F~!(p) = p'/. 
T 
2, flala,p) = DDT 1)(1 — x) — (6 — 1)a]r%-?(1 — x)’-?. Therefore, f’(z|a,8) = 0 and 
P(a)I'(6) 


x = (a—1)/(a+ 8-2). It can be verified that if a > 1 and 6 > 1, then f(zla,@) is actually a 
maximum for this value of x. 


3. The vertical scale is to be chosen in each part of Fig. 5.5.4 so that the area under the curve is 1. The 
figure in (h) is the mirror image of the figure in (g) with respect to « = 1/2. 


4, Let Y=1-—X. Then X =1-Y. Therefore, |dx/dy| = 1 and,0<y< 1, 


ifs) ano = gye, 


This is the p.d.f. of the beta distribution with the values a and § interchanged. 


5. 
= T(a + B) * otr—l — gb tele 
~ Tar) [ ms ae 
_ Ta+8) Ma+rrGts) 
T(a)I (6) T(a+Bt+rt+s) 
Tiatr) P(b+s) (a+ 8) 


I'(a) (eg) T(a+f+rt+s) 
lo(a+1)---(a+r—DIIBB+)--- (B+s—1)] 
(a+B\(at+B+1)--(a+B+r+s—-1) — 


6. The joint p.d.f. of X and Y will be the product of their marginal p.d.f.’s Therefore, for « > 0 and y > 0, 
pot ai—l a a2—1 

———4 exp(— 6a) ———— exp(— 

Ta) p(-6 Flan)! p(—By) 

ai: ay—l1l, a2—-1 


= Tames) y exp(—A(z + y)). 


f(z,y) 
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Figure $.5.4: Figure for Exercise 3 of Sec. 5.8. 


Also, X = UV and Y = (1-—U)V. Therefore, the Jacobian is 


Ox 


_ Ou 
J = det dy 
Ou 
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As x and y vary over all positive values, u will vary over the interval (0, 1) and v will vary over all 


possible values. Hence, for 0 < u <1 and v > 0, the joint p.d.f. of U and V will be 


g(u,v) = fluv, (1 — u)vlu = 


T(ai +02) 4,-1 
Pa )(a2) 


U 


(b= 


yet 


P(a1 + a2) 


U 


ayt+ag—1 


exp(—8v). 


It can be seen that this joint p.d.f. has been factored into the product of the p.d.f. of a beta distribution 
with parameters a; and a» and the p.d.f. of a gamma distribution with parameters a; + ag and £6. 
Therefore, U and V are independent, the distribution of U is the specified beta distribution, and the 
distribution of V is the specified gamma distribution. 
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7. Since X; and X» each have the gamma distribution with parameters @ = 1 and £, it follows from 
Exercise 6 that the distribution of X;,/(X, + X2) will be a beta distribution with parameters a = 1 
and 6 = 1. This beta distribution is the uniform distribution on the interval (0, 1). 


8. (a) Let A denote the event that the item will be defective. Then 
a 


1 1 
Pr(A) =| Pr(A|zx) f(x) dz =| nf (2) de = B(X) = 


(b) Let B denote the event that both items will be defective. Then 


Pr(B) = Pr(B| 2) f(a) dz = ; x f (x) dx = E(X?) = a(a + 1) 
0 0 


(a+ B(a+B+1) 


9. Prior to observing the sample, the mean of P is a/(a+) = 0.05, which means that a = 6/19. If we use 
the result in the note that follows Example 5.8.3, the distribution of P after finding 10 defectives in a 
sample of size 10 would be beta with parameters a+10 and 3, whose mean is (a+10)/(a+G8+10) = 0.9. 
This means that a = 96 — 10. So 96 — 10 = 6/19 and 8 = 19/17 soa =1/17. The distribution of P 
is then a beta distribution with parameters 1/17 and 19/17. 


10. The distribution of P is a beta distribution with parameters 1 and 1. Applying the note after Exam- 
ple 5.8.3 with n = 25 and x = 6, the conditional distribution of P after observing the data is a beta 
distribution with parameters 7 and 20. 


5.9 The Multinomial Distributions 


Commentary 


The family of multinomial distributions is the only named family of discrete multivariate distributions in the 
text. It arises in finite population sampling problems, but does not figure in the remainder of the text. 

If one is using the statistical software R, then the function dmultinom gives the joint p.f. of a multinomial 
vector. The syntax is that the first argument is the argument of the function and must be a vector of the 
appropriate length with nonnegative integer coordinates. The next argument must be specified as prob= 
followed by the vector of probabilities, which must be a vector of the same length as the first argument. 
The function rmultinom gives a random sample of multinomial random vectors. The first argument is how 
many you want, the next argument specifies what the sum of the coordinates of every vector must be (n 
in the notation of the text), and the third argument is prob as above. All of the solutions that require the 
calculation of multinomial probabilites can be done using these functions. 


Solutions to Exercises 


1. Let Y = X, +---+ Xe. We shall show that Y has the binomial distribution with parameters n and 
py t-:+ + pe. Let Z2,...,Zp be iid. random variables with the p-f. 


4 forz=i1,1=1,...,k, 
Kea={? 


0 otherwise. 
For each i= 1,...,& and each j = 1,...,n, define 


Ajj = {27 =a), 


_ 1 if Aj; occurs, 
We = i if not. 
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nm 
Finally, define V; = ys Wj; fori =1,...,k. It follows from the discussion in the text that (X1,..., Xx) 
j=l 
has the same distribution as (Vi,..., Vx). Hence Y has the same distribution as U = Vj +---+ Vp. But 


Lon mn £ 
U=V4+---+Ve= >> Wy => >> Wi. 


i=1 j=l j=li=1 


L 
Define U; = DAT It is easy to see that U; = 1 if UL, Ai occurs and U; = 0 if not. Also 
i=l 
Pr(Uf_, Ai;) =p, +---+ pe. Hence, Uy,...,Un are ii.d. random variables each having a Bernoulli 
nm 


distribution with parameter p,+---+pe. Since U = .s U;, we know that U has the binomial distribution 
i=1 
with parameters n and pj +---+ pe. 


. The probability that a given observed value will be less than a; is pj = F(a) = 0.3, the probability 
that it will be between a, and ag is pp = F(a2) — F(a,) = 0.5, and the probability that it will be 
greater than ag is pj = 1 — Fag) = 0.2. Therefore, the numbers of the 25 observations in each of 
these three intervals will have the multinomial distribution with parameters n = 25 and p = (pj, p2, p3). 
Therefore, the required probability is 


25! 
————-(0.3)®(0.5)""(0.2)°. 
Sliolg! | Pee 
. Let X; denote the number of times that the number 1 appears, let X2 denote the number of times 
that the number 4 appears, and let X3 denote the number of times that a number other than 1 or 4 
appears. Then the vector (X 1, X2, X3) has the multinomial distribution with parameters n = 5 and 
p = (1/6,1/6, 4/6). Therefore, 


Pr(X1 = X2) 


Pr(Xy =0, X2=0, X3= 5) + Pr(X1 =1, X,=1, X3= 3) 
+ Pr(Xy =2, X,=2, X3= 1) 
(§) +a (6) G) (6) + aa (6) G) G) 
= i ae re oe = =). Rae toe = 
6 1!1!3! \6 6 6 QI2'1! \6 6 6 
1024 1280 120 _ 2424 
eo 8 6 o 


. Let X3 denote the number of rolls for which the number 5 appears. If X; = 20 and X» = 15, then 
it must also the true that X3 = 5. The vector (X1, X2,X3) has the multinomial distribution with 
parameters 


n = AO, 

gq = potpa+ pe = 0.30 + 0.05 + 0.07 = 0.42, 
g@ = prt+p3 = 0.114 0.22 = 0.33, 

q3 = ps = 0.25. 


Therefore, 


40! 


Pr(Xy = 20 and X9 _ 15) = Pr(X, — 20, X2 = 15, X3 = 5) = D0ON5I5! 


(0.42)?° (0.33)1°(0.25)°. 
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5. The number X of freshman or sophomores selected will have the binomial distribution with parameters 
n= 15 and p=0.16+4+ 0.14 = 0.30. Therefore, it is found from the table in the back of the book that 


Pr(X > 8) = .0348 + .0116 + .0030 + .0006 + .0001 = .0501. 


6. By Eq. (5.9.3) 


E(X3) = 15(0.38) = 5.7, 

E(X4) = 15(0.32) = 4.8, 

Var(X3) = 15(0.38)(0.62) = 3.534, 

Var(X4) = 15(0.32)(0.68) = 3.264 
By Eq. (5.9.3), 


Cov(X3, X4) = —15(0.38)(0.32) = —1.824. 
Hence, 


E(X3 — X4) =5.7-4.8 =0.9 


and 

Var(X3 — X4) = 3.534 + 3.264 — 2(—1.824) = 10.446. 

7. For any nonnegative integers 71,...,7, such that Suan =n, 
k 
Pr(X, = iy AES 
Pr (x = Pip adcg Ah Hak bo x= 2] eB eal 
es Pr (>. X= “| 
i=l 

Since Xj,...,X, are independent, 


Pr Xt = Wi54.07yAe = Be) SFr A — ah) a Pel A= gy). 
Since X; has the Poisson distribution with mean 4,, 


exp(—A,)\F" 


Also, by Theorem 5.4.4, the distribution of vy X; will be a Poisson distribution with mean \ = 
yok, Ai. Therefore, 


It follows that 


Pr (Xan Xe =a 


é n! Ro NGN* 
xi=n)=>" IT (F) 
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8. Let the data be called X = (X1, X2, X3), with X, being the number of working parts, X2 being the 
number of impaired parts, and X3 being the number of defective parts. The conditional distribution 
of X given p is a multinomial distribution with parameters 10 and p. So, the conditional p.f. of the 
observed data is 


10 
2 = °, D3. 
9(8, ,0|p) (330) 


The joint p.f./p.d.f. of X and p is the product of this with the p.d.f. of p: 


10 
12 (. >, jee = 540p1°p3. 


To find the conditional p.d.f. of p given X = (10, 2,0), we need to divide this expression by the marginal 
p.f. of X, which is the integral of this last expression over all (p1,p2) such that p; > 0 and p, + po < 1. 
This integral can be written as 


T(11)0(4) 


= 0.0450. 
T(15) 


1 pl—py 1 
[ [ 540p!p2dpodp, = [ 180p!9(1 — pi)? = 180 


For the second equality, we Theorem 5.8.1. So, the conditional p.d.f. of p given X = (10, 2,0) is 


12012p}°p3 if 0 < pi,p2 < 1 and py +p2 <1, 
0 otherwise. 


5.10 The Bivariate Normal Distributions 


Commentary 


The joint distribution of the least squares estimators in a simple linear regression model (Sec. 11.3) is a 
bivariate normal distribution, as is the posterior distribution of the regression parameters in a Bayesian 
analysis of simple linear regression (Sec. 11.4). It also arises in the regression fallacy (Exercise 19 in Sec. 11.2 
and Exercise 8 in Sec. 11.9) and as another theoretical avenue for introducing regression concepts (Exercises 2 
and 3 in Sec. 11.9). The derivation of the bivariate normal p.d.f. relies on Jacobians from Sec. 3.9 which the 
instructor might have skipped earlier in the course. 


Solutions to Exercises 


1. The conditional distribution of the height of the wife given that the height of the husband is 72 inches is 
a normal distribution with mean 66.8 +0.68 x 2(72—70)/2 = 68.16 and variance (1—0.687)2? = 2.1504. 
The 0.95 quantile of this distribution is 


68.16 + 2.1504'/2@-1(0.95) = 68.16 + 1.4664 x 1.645 = 70.57. 


2. Let X, denote the student’s score on test A and let X2 denote his score on test B. The conditional 


distribution of X2 given that X, = 80 is a normal distribution with mean 90 + (0.8)(16) (==) = 


83.6 and variance (1 — 0.64)(256) = 92.16. Therefore, given that X, = 80, the random variable 
Z = (X2 — 83.6)/9.6 will have the standard normal distribution. It follows that 


2 2 
Pr(X_ > 90|X, = 80) = Pr (z > =) =1-6 (5) = 0.2524. 
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The sum X, + Xo. will have the normal distribution with mean 85 + 90 = 175 and variance (10)? + 
(16)? + 2(0.8)(10)(16) = 612. Therefore, Z = (X1 + X2 — 175)/24.7386 will have the standard normal 
distribution. It follows that 


Pr(X, + X_ > 200) = Pr(Z > 1.0106) = 1 — (1.0106) = 0.1562. 


The difference X,; — X > will have the normal distribution with mean 85 — 90= — 5 and variance 
(10)? + (16)? — 2(0.8)(10)(16) = 100. Therefore, Z = (X, — X2 +5)/10 will have the standard normal 
distribution. It follows that 


Pr(X) > X2) = Pr(X1 — X2 > 0) = Pr(Z > 0.5) = 1— (0.5) = 0.3085. 


The predicted value should be the mean of the conditional distribution of X, given that X2 = 100. 
100 — 90 

This value is 85 + (0.8)(10) (=~) = 90. The M.S.E. for this prediction is the variance of the 

conditional distribution, which is (1 — 0.64)100 = 36. 


Var(X1+bX2) = 0? +6703 +2bpc102. This is a quadratic function of b. By differentiating with respect 
to b and setting the derivative equal to 0, we obtain the value b = —po 1/09. 


Since E(X1|X2) = 3.7 — 0.15.X2, it follows from Eq. (5.10.8) that 
; o 
(i) pur — ppp = 3.7, 
a2 
OL _ 


(ii) p— = —0.15. Since E(X2|X1) = 0.4 — 0.6.1, it follows from Eq. (5.10.6) that 
O2 


ae oO 
(iii) 2 — p— [1 =A, 
O71 


Finally, since Var(X2|X1) = 3.64, it follows that 
(v) (1 — p*)o3 = 3.64. 
By multiplying (ii) and (iv) we find that p? = 0.09. Therefore, p = +0.3. Since the right side of (ii) 
is negative, p must be negative also. Hence, p = —0.3. It now follows from (v) that 3 = 4. Hence, 


o2 = 2 and it is found from (ii) that 0; = 1. By using the values we have obtained, we can rewrite (i) 
and (iii) as follows: 


(i) py + 0.15 p12 = 3.7, 


(iii) 0.644 + 2 = 0.4. 
By solving these two simultaneous linear equations, we find that yw, = 4 and pg = —2. 


. The value of f(x1, 22) will be a maximum when the exponent inside the curly braces is a maximum. In 


turn, this exponent will be a maximum when the expression inside the square brackets is a minimum. 
If we let 


— t1— I _ %2— M2 
ay = —— and ag = ——, 
O1 02 


then this expression is 
2 2 
ay — 2payaz + a5. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


10. 


Section 5.10. The Bivariate Normal Distributions 179 


We shall now show that this expression must be nonnegative. We have 
0. < (a1| — |aa|)? = af + a3 — 2larag| < af + a3 — 2|parayl, 

since |p| < 1. Furthermore, |pajag| > paiaz. Hence, 
O< ay + as — 2paj,ag. 


The minimum possible value of a? — 2pa,aq + az is therefore 0, and this value is attained when a, = 0 
and ag = 0 or, equivalently, when x1 = yp and x2 = pe. 


. Let a, and az be as defined in Exercise 8. If f(x1,22) = k, then a? — 2pa,ay + a3 = b?, where b is a 


particular positive constant. Suppose first that p = 0 and oj = a2 =o. Then this equation has the 
form 


(xy = yi)” + (xo = [2)? — bo. 


This is the equation of a circle with center at (j1, 12) and radius bo. Suppose next that p = 0 and 
01 #09. Then the equation has the form 


(t1— $1)? | (t2— Ha)? _ 49 

ge ee 
This is the equation of an ellipse for which the center is (f1, 42) and the major and minor axes are parallel 
to the 2; and x2 axes. Suppose finally that p 4 0. It was shown in Exercise 8 that at — 2paya2+ a3 >0 
for all values of a, and ag. It therefore follows from the methods of analytic geometry and elementary 
calculus that the set of points which satisfy the equation 


(tr ji)" D) (e1—p1) (v2 — 2) , (a2 — pe)? _ 72 
ee ee 
OF O1 a2 0% 


will be an ellipse for which the center is ({41, 42) and for which the major and minor axes are rotated 
so that they are not parallel to the x; and x2 axes. 


Let A = det be a Since A ¥ 0, the transformation from X; and X2 to Y; and Y% is a one-to-one 
21 422 


transformation, for which the inverse transformation is: 


1 
X, = A la22lyn — by) — ay2(Yo — b2)], 
1 
Xg = Aaa — bi) + a22(¥Yo — b2)]. 


The joint p.d.f. of Y; and Y2 can therefore be obtained by replacing x; and x2 in f(x1,x2) by their 
expressions in terms of y; and y2, and then multiplying the result by the constant 1/|A|. After a great 
deal of algebra the exponent in this joint p.d.f. can be put into the following form: 


aa ee 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


180 


Chapter 5. Special Distributions 


where 
my = E(Yi) = a1ipa + ai2pe + 61, 
mg = E(Y2) = agp + aogft2 + be, 
2 2 2 2 2 
sj = Var(Y1) = a{yo] + ajoo5 + 2a11012/0102, 
2 _ 2 9 2 2 
85 = Var(Y2) = a9,07 + 45,05 + 22142290102, 
Cov(Y1, Y3) 1 
2 2 2 2 
r= a = [an of + (01122 + 12021) 90102 + Af205905. 
$152 $152 


It can then be concluded that this joint p.d-.f. is the p.d.f. of a bivariate normal distribution for which 
the means are m, and mg, the variances are s? and s3, and the correlation is r. 


11. By Exercise 10, the joint distribution of X, + X29 and X, — X9 is a bivariate normal distribution. By 
Exercise 9 of Sec. 4.6, these two variables are uncorrelated. Therefore, they are also independent. 


12. (a) For the first species, the mean of a,X 1+ a2X2 is 201a; + 118a2, while the variance is 


Na 


15.27a? + 6.67a3 + 2 x 15.2 x 6.6 x 0.64a1a9. 


The square-root of this is the standard deviation, (231.04a7 + 43.56a3 + 128.41a,a2)!/?. For the 
second species, the mean is 187a; + 13la2. The standard deviation will be the same as for the 
first species because the values of 01, g2 and p are the same for both species. 


At first, it looks like we need a two-dimensional maximization. However, it is clear that the ratio 
in question, namely, 
—14a, + 18a9 
(231.040? + 43.56a3 + 128.41a,a9)!/2 


will have the same value if we multiply both a, and az by the same positive constant. We could 
then assume that the pair (a1, a2) lies on a circle and hence reduce the maximization to a one- 
dimensional problem. Alternatively, we could assume that a;-+a2 = 1 and then find the maximum 
of the square of (S.5.12). (We would also have to check the one extra case in which a, = —az to 
see if that produced a larger value.) We shall use this second approach. If we replace az by 1— a, 
we need to find the maximum of 


(13 — 27a,)? 
231.04a7 + 43.56(1 — a;)2 + 128.41a;(1 — a1)’ 


The derivative of this is the ratio of two polynomials, the denominator of which is always positive. 
So, the derivative is 0 when the numerator is 0. The numerator of the derivative is 13 — 27a, 
times a linear function of aj. The two roots of the numerator are 0.4815 and —0.5878. The first 
root produces the value 0 for (S.5.12), while the second produces the value 3.456. All pairs with 


a, = —a lead to the values +2.233. So ay = —0.5878 and ag = 1.5878 provide the maximum of 
(S.5.12). 


(S.5.12) 


13. The exponent of a bivariate normal p.d.f. can be expressed as —[ax? + by? + cry + ex + gy + h], where 


1 
a= —~4—s, 
207 (1 — p*) 
1 
o = 5a ahi 
203(1 — p?) 
p 
c ry Pr Tue) 
o102(1 — p?) 
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M1 2p 
e = -——>S > ts 
oj(1—p?) — a102(1 — p*) 

g S = M2 Hip 


a, + —. ar 
03(1—p?)  a102(1 — p?) 


and h is irrelevant because exp(—h) just provides an additional constant factor that we are ignoring 
anyway. The only restrictions that the bivariate normal p.d.f. puts on the numbers a, b, c, e, and g 
are that a,b > 0 and whatever is equivalent to |p| < 1. It is easy to see that, so long as a,b > 0, we 
will have |p| < 1 if and only if ab > (c/2)?. Hence, every set of numbers that satisfies these inequalities 
corresponds to a bivariate normal p.d.f. Assuming that these inequalities are satisfied, we can solve the 
above equations to find the parameters of the bivariate normal distribution. 


c/2 
pS Tohl/2” 
(ab) 
o = oe 
2a — c#/{2b]’ 
of = — 
? 2b — c#/[2a]’ 
_ cg — 2be 
he 
_ ce—2ag 
eo dab =e 


The marginal p.d.f. of X is 
1 1 9 
fiz) = (no)? &*P (-sale — pl ) 
where ys and a? are the mean and variance of X. The conditional p.d.f. of Y given X = x is 
1 1 2 
g2(y|z) = (Qnr2yif2 SP (-sa0 —ax— 0] ) : 


The joint p.d.f. of (X,Y) is the product of these two 


f(x,y) = =— exp (—[a'x? + bly? + exy + ex + gy +h), 
where 
1 1 4 ar 
——— 
Qo02 27?’ 
1 
4 = — 
Qr2’ 
_ a 
Cc = ae 
_ jb ab 
Ce. Seat 
b 
Gg = “p? 
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and h is irrelevant since we are going to apply the result from Exercise 13. Clearly a’ and b’ are positive. 
We only need to check that a’b! > (c/2)?. Notice that 


i a? il 
yI 2 
b = —_—— —_> = 2 = 
: 4o27? . 47? (c/2)" + 4o272’ 
so the conditions of Exercise 13 are met. 


(a) Let Y = ss X;. Since Xj,...,Xp are independent, we know that Y is independent of X;. Since 
j#i 
Y is the sum of independent normal random variables it has a normal distribution. The mean 
and variance of Y are easily seen to be (n — 1) and (n — 1)o? respectively. Since Y and X; are 
independent, all pairs of linear combinations of them have a bivariate normal distribution. Now 
write 


X, = 1X, +0Y, 
_ 1 1 
Xp, = —X;+-Y. 
n n 
Clearly, both X; and Y have mean pz, and we already know that X; has variance a? while Y has 


variance o?/n. The correlation can be computed from the covariance of the two linear combina- 
tions. 


1 1 1 
Cov (1x, 4+0Y, —X; + “y) = —97, 
n n n 
The correlation is then (0?/n)/[o20?/n]'/? = 1/n4/?. 


(b) The conditional distribution of X; given Xn = Fp is a normal distribution with mean equal to 


1 GC s= = 
b+ nih g/aue — pt) =Tn. 


The conditional variance is 


(2) 


5.11 Supplementary Exercises 


Solutions to Exercises 


ie 


Let gi(z|p) be the conditional p.f. of X given P = p, which is the binomial p.f. with parameters n and 
p. Let fo(p) be the marginal p.d.f. of P, which is beta p.d.f. with parameters 1 and 1, also known as 
the uniform p.d.f. on the interval [0,1]. According to the law of total probability for random variables, 
the marginal p.f. of X is 


file) = | gs(elp)fato)ap = | ("ora — p)"*dp = (") ce 


xz} (n+1)! n+’ 


for c = 0,...,n. In the above, we used Theorem 5.8.1 and the fact that ['(k + 1) = k! for each integer 
ie 


. The random variable U = 3X + 2Y — 6Z has the normal distribution with mean O and variance 


32 +2? +6? = 49. Therefore, Z = U/7 has the standard normal distribution. The required probability 
is 


Pr(U < —7) =Pr(Z < -1) =1— ®(1) = .1587. 
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3. Since Var(X) = E(X) = 1 and Var(Y) = E(Y) = 2, it follows that 1 + Ag = 5. Hence, X + Y has 


the Poisson distribution with mean 5 and 
Pr(X +Y < 2)=Pr(X + Y =0)4+ Pr(X + Y = 1) = exp(—5) + 5exp(—5) = .0067 + .0337 = .0404. 


. It can be found from the table of the standard normal distribution that 116 must be .84 standard 
deviations to the left of the mean and 328 must be 1.28 standard deviations to the right of the mean. 
Hence, 


u— 840 = 116, 
b+1.280 = 328. 


Solving these equations, we obtain js = 200 and o = 100, a? = 10,000. 


. The event {X < 1/2} can occur only if all four observations are 0, which has probability (exp(—A))*, 
or three of the observations are 0 and the other is 1, which has probability 4(\ exp(—A))(exp(—A))?. 
Hence, the total probability is as given in this exercise. 


. If X has the exponential distribution with parameter {, then 


25 = Pr(X > 1000) = exp(—(1000)). 


1 1 1000 
pid ao 
09 Po BA) 3 ie 


. It follows from Exercise 18 of Sec. 4.9 that 


Hence, 3 = 


El(X — p)?] = E(X*) — 3p0? — p’. 


Because of the symmetry of the normal distribution with respect to yz, the left side of this relation is 
0. Hence, 


EO) = Bye" tae. 


. X and Y have independent normal distributions with the same mean p, and Var(X) = 144/16 = 
9, Var(Y) = 400/25 = 16. Hence, X — Y has the normal distribution with mean 0 and variance 
9+ 16 = 25. Thus, Z = (X —Y)/5 has the standard normal distribution. It follows that the required 
probability is Pr(|Z| < 1) = .6826. 


. The number of men that arrive during the one-minute period has the Poisson distribution with mean 
2. The number of women is independent of the number of men and has the Poisson distribution with 
mean 1. Therefore, the total number of people X that arrive has the Poisson distribution with mean 
3. From the table in the back of the book it is found that 


Pr(X < 4) = .0498 + .1494 + .2240 + .2240 + .1680 = .8152. 
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by (t) = E(exp(tY)) = Elexp(tX1 + +--+ tXy)] 
= E{Elexp(tX, +---+tXy)|N]} 
= B{[b(t)|%} 


= Swope 


xz=0 


«cea BL 
«r=0 : 


= exp(—A) exp(Ay(t)) = exp{A[v(¢) — 1]}- 


The probability that at least one of the two children will be successful on a given Sunday is (1/3) + 
(1/5) —(1/3)(1/5) = 7/15. Therefore, from the geometric distribution, the expected number of Sundays 
until a successful launch is achieved is 15/7. 


For any positive integer n, the event X > n will occur if and only if the first n tosses are either all 
heads or all tails. Therefore, 


miton=() (=) 


and, for n = 2,3,.... 


Pr(X =n) =Pr(X >n-1)-—Pr(X > n) = Cs — Gy - (3) : 
Hence, 


1 x—1 
A= (5) for x = 2,3,4,... 


0 otherwise. 


By the Poisson approximation, the distribution of X is approximately Poisson with mean 120(1/36) = 
10/3. The probability that such a Poisson random variable equals 3 is exp(—10/3) (10/3)? /3! = 0.2202. 
(The actual binomial probability is 0.2229.) 


It was shown in Sec. 3.9 that the p.d.f.’s of Yj, Y,, and W, respectively, are as follows: 
aly) = n(l—y)4 for0<y <1, 
only) = ny” for0<y <1, 
Ai(w) = n(n—1)w™*(1—w) forO<w <1. 


Each of these is the p.d.f. of a beta distribution. For gj,a@ = 1 and 6 =n. For g,, a=n and 6 = 1. 
For hj, a=n—1and §=2. 


(a) Pr(Z, > t) = Pr(X = 0), where X is the number of occurrences between time 0 and time t. Since 
X has the Poisson distribution with mean 5t, it follows that Pr(Z, > t) = exp(—5t). Hence, T; 
has the exponential distribution with parameter § = 5. 
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(b) 7; is the sum of k i.i.d. random variables, each of which has the exponential distribution given 
in part (a). Therefore, the distribution of T; is a gamma distribution with parameters a = k and 
B=5. 

(c) Let X; denote the time following the ith occurrence until the (i + 1)st occurrence. Then the 
random variable Xj,...,X,—1 are i.i.d., each of which has the exponential distribution given in 
part (a). Since ¢ is measured in hours in that distribution, the required probability is 


1 
Pr (x > 3! = Lists 1) = (exp(—5/3))*"1. 


We can express 75 as JT; + V, where V is the time required after one of the components has failed until 
the other four have failed. By the memoryless property of the exponential distribution, we know that 
T, and V are independent. Therefore, 


1 


Cov(T;, T5) = Cov(1, 7, + V) = Cov(T%, 71) + Cov(T1, V) = Var(T,) +0 = 35a" 


since 7) has the exponential distribution with parameter 5. 


___ be 
kBy + Bo” 


Since the sample size is small relative to the size of the population, the distribution of the number X 
of people in the sample who are watching will have essentially the binomial distribution with n = 200 
and p = 15000/500000 = .03, even if sampling is done without replacement. This binomial distribution 
is closely approximated by Poisson distribution with mean \ = np = 6. Hence, from the table in the 
back of the text, 


Pr(Xy > kX2) = [ Pr XX; > 6Xo|Xo =@)( foe ide= [ exp(— 6, ka) 89 exp(—Box)dx 


Pr(X < 4) = .0025 + .0149 + .0446 + .0892 = .1512. 
It follows from Eq. (5.3.8) that 


= 1 1- T— 
Var(X) = — Var(X) = dat 2 =. 
n? n T-1 
where T is the population size, p is the proportion of persons in the population who have the charac- 
teristic, and n = 100. Since p(1 — p) < 1/4 for 0 < p< 1 and (T —n)(T — 1) < 1 for all values of T, it 
follows that 


Hence, the standard deviation is < 1/20 = .05. 


Consider the event that less than r successes are obtained in the first n Bernoulli trials. The left side 
represents the probability of this event in terms of the binomial distribution. But the event also means 
that more than n trials are going to be required in order to obtain r successes, which means that more 
than n — r failures are going to be obtained before r successes are obtained. The right side expresses 
this probability in terms of the negative binomial distribution. 


Consider the event that there are at least k occurrences between time 0 and time t. The number X 
of occurrences in this interval has the specified Poisson distribution, so the left side represents the 
probability of this event. But the event also means that the total waiting time Y until the kth event 
occurs is < t. It follows from part (b) of Exercise 15 that Y has the specified gamma distribution. 
Hence, the right side also expresses the probability of this same event. 
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22. It follows from the definition of h(x) that 
. * f(t) 
[ h(t)d [ Rpt =~ belt - FO! 
Therefore, 
as |- | h(t) a ee, 
0 


23. (a) It follows from Theorem 5.9.2 that 


1/2 
[Var(X;) Var(X,) 2 10pi(l—pdpjd—pph?  \i-m 1-2, 


(b) p(X;, X;) is most negative when p; and p; have their largest values; i.e., for i = 1 (pj = .4) and 


j = 2 (po =.3). 
(c) p(X;, X;) is closest to 0 when p; and p; have their smallest values; i.e., for i = 3 (p3 = .2) and 
j=4 (ps = -1). 


24. It follows from Theorem 5.10.5 that X 1 — 3X. will have the normal distribution with mean ju, — 3p2 
and variance a? + 903 — 6po 09. 


25. Since X has a normal distribution and the conditional distribution of Y given X is also normal with a 
mean that is a linear function of X and constant variance, it follows that X and Y jointly have a bivariate 
normal distribution. Hence, Y has a normal distribution. From Eq. (5.10.6), 2X — 3 = uo + pooX. 
Hence, 2 = —3 and pa = 2. Also, (1 — p?)o3 = 12. Therefore o3 = 16 and p = 1/2. Thus, Y has the 
normal distribution with mean —3 and variance 16, and p(X,Y) = 1/2. 


26. We shall use the relation 
E(X{X2) = E[E(X7X2 | X2)| = B[X2E(X7 | X2)). 
But 
2 2 2\ 2 1 : 
B(X}|Xa) = Var(Xa]Xa) + [E(%|Xa)? = (1 eo? + (in + oa) 


Hence, 


2 
Oo oO 
X_E(X7|X2) = (1— p?)of Xo + wi Xe + 2p — X32 + (62) X3. 


The required value E(X?X2) is the expectation of this quantity. But since X2 has the normal distri- 


bution with E(X2) = 0, it follows that E(X3) = o3 and E(X3) =0. 


Hence, 


E(X?2X2) = 2p[110109. 
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Large Random Samples 


6.1 


Introduction 


Solutions to Exercises 


1. 


The p.d.f. of Y = X1 + X92 is 


y if0<y<1, 
gy)=% 2-y ifl<y<2, 
0 otherwise. 


It follows easily from the fact that X2 = Y/2 that the p.d-f. of X2 is 


Ax ig a a na 
hia)=4 4-45 W1f2<ae <1, 
0 otherwise. 


We easily compute 


Pr(|X1 — 0.5] < 0.1) 0.6 — 0.4 = 0.2, 


Pr([X2 —0.5|<0.1) = i Axdx +f (4 — 4x) dx 
2(0.5 


2 9A") + 4(0.6 = 0.5) = 2(0.67 — 0.8") = 0:36. 


The reason that X» has higher probability of being close to 0.5 is that its p.d.f. is much higher near 
0.5 than is the uniform p.d.f. of X1 (twice as high right at 0.5). 


. The distribution of X,, is (by Corollary 5.6.2) the normal distribution with mean py and variance o?/n. 


By Theorem 5.6.6, 


Pr(|Xn — pI Sc) 


Pr(Xn <c)— 
: (sare) - Wess (S.6.1) 


As n > 00, c/(a/n'/?) — oo and —c/(a/n'/?) — —oo. It follows from Property 3.3.2 of all c.d.f.’s that 
(S.6.1) goes to 1 as n > on. 


. To do this by hand we would have to add all of the binomial probabilities corresponding to W = 


80,...,120. Most statistical software will do this calculation automatically. The result is 0.9964. It 
looks like the probability is increasing to 1. 
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6.2 The Law of Large Numbers 


Commentary 

The discussion of the strong law of large numbers at the end of the section might be suitable only for the 
more mathematically inclined students. 

Solutions to Exercises 


1. Let « > 0. We need to show that 
Jim, Pr(|X,, — 0| > «) = 0. (S.6.2) 
Since X,, > 0, we have |X,,—0| > € if and only if X, > «. By the Markov inequality Pr(X, > €) < pn/e. 
Since Jim, Lin = 0, Eq. (8.6.2) holds. 
2. By the Markov inequality, 
E(X) > 10Pr(X > 10) =2. 


3. By the Chebyshev inequality, 
9 
Var(X) > 9Pr(|X — p| > 3) =9Pr(X <7 or X > 13) = 9(0.2 +. 0.3) = 3 


4. Consider a distribution which is concentrated on the three points w, 4 + 30, and yu — 30. Let Pr(X = 
pt) = pi, Pr(X = w+ 3c) = po, and Pr(X = pw — 30) = ps. If we are to have E(X) = py, then we must 
have pg = p3. Let p denote the common value of pg and p3. Then py = 1 — 2p, because p; + po + p3 = 1. 
Now 


Var(X) = E[(X — p1)?] = 907(p) + 907(p) + 0(1 — 2p) = 1807p. 


Since we must have Var(X) = o?, then we must choose p = 1/18. Therefore, the only distribution which 


is concentrated on the three points pu, + 30, and ys — 30, and for which E(X) = p and Var(X) = o?, 


is the one with p; = 8/9 and pz = ps = 1/18. It can now be verified that for this distribution we have 


1 1 1 
Pr(|X — p| 2 30) = Pr(X = p+ 30) + P(X =n—-30)= +R = 5: 


5. By the Chebyshev inequality, 


1 
Pr(|Xp, — p| < 20) > 1 —- —. 
4n 


Therefore, we must have 1 — = > 0.99. or n > 25. 
n 


6. By the Chebyshev inequality, 


ra = 1 16 
Pr(6<Xq <7) =Pr(n— wl <5) 21-—. 
n 


16 
Therefore, we must have 1 — — > 0.8 or n > 80. 
n 
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7. By the Markov inequality, 


8. 


9. 


10. 


al. 


B(\X ~ yl") _ Ba 


Pr(|X — p] > t) = Pr(|X — pl" >) < ——, - 


(a) In this example E(Q,,) = 0.3 and Var(Q,,) = (0.3)(0.7)/n = 0.21/n. Therefore, 


0.21. 21 


Pr(0.2 < Qn < 0.4) = Pr(|Qn — B(Qn)| < 0.1) 2 1 - n(0.01) on 


21 
Therefore, we must have 1 — — > 0.75 or n > 84. 
n 


(b) Let X,, denote the total number of items in the sample that are of poor quality. Then X, = nQ,, 
and 


Pri02 <Q, < 04) =Pr(0.2n < X= 040). 


Since X,, has a binomial distribution with parameters n and p = 0.3, the value of this probability 
can be determined for various values of n from the table of the binomial distribution given in the 
back of the book. For n = 15, it is found that 


Pri0.20< Ay = 04n) = Pra =< X, = 6) = 0.7419. 
For n = 20, it is found that 
Pr.2n< A, < 04n) = Prax. XS 8) = 07706. 


Since this probability must be at least 0.75, we must have n = 20, although it is possible that 
some value between n = 15 and n = 20 will also satisfy the required condition. 


1 TL 
E(Z,) =n? -— +0 (1 — ~) =n. Hence, lim E(Z,,) = oo. Also, for any given € > 0, 
nm n noo 


1 
Pr(|Z,| <€) = Pr(Z, =0) =1——. 
n 


Hence, lim Pr(|Z,| < ) = 1, which means that Z, + 0. 
N—- Oo 
By Exercise 5 of Sec. 4.3, 
E\(Z, =)y |= (2, = oF + VartZ,,). 


Therefore, the limit of the left side will be 0 if and only if the limit of each of the two terms on the 
right side is 0. Moreover, tim, [E(Z,) — b]? = 0 if and only if Jim, EZ) = 0; 


Suppose that the sequence Z1, Z2,... converges to b in the quadratic mean. Since 
then for any value of € > 0, 


Pr(|Zn — | <€) > Pr(\Zp —E(Zn)| +|E(Zn) — b| < 6) 
Pr(|Zn _ E(Z,)| <€- |E(Zn) ~ b|). 
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By Exercise 10, we know that Jim, E(Z,,) = 6. Therefore, for sufficiently large values of n, it will be 


true that « —|E(Z,,) — b| > 0. Hence, by the Chebyshev inequality, the final probability will be at least 
as large as 


7 Var(Zp,) 
le — |E(Zn) — bP 


Again, by Exercise 10, 
lim Var(Z,) =0 and lim [e—|E(Zp) — ol]? =e. 
noo nN—-oo 


Therefore, 


. Var (Zp) 
_ > an an 1a, el 
Jim, Pr(lZn—61<6) > Jim (1 - Egan aet ob 


which means that Z,, pa 


We know that E(X,)=p and Var(X,,)=07/n. Therefore, Jim, BX.) =p and Jim, VarX_) =O 


The desired result now follows from Exercise 10. 


(a) For any value of n large enough so that 1/n < €, we have 


1 1 
Pr(|Zn| <e¢)=Pr (Z ==) —1|-— 


n> 
Therefore, lim Pr(|Zn| < €) = 1, which means that Z,, 4 0. 
noo 
1 1 1 2 1 : ; 
(b) E(Z,) =—(1-—]4+2|+) ==--. Therefore, lim E(Z,,) = 0. It follows from Exercise 10 
n nm n nn n—-0o 
that the only possible value for the constant c is c = 0, and there will be convergence to this value 
if and only if lim Var(Z,) = 0. But 
nr 


5, 1 a 1 1 


2 

Hence, Var(Z,) = 14 ae (- =) and Jim Var(Zn) = 1; 

Let X have p.f. equal to f. Assume that Var(X) > 0 (otherwise it is surely less than 1/4). First, 
suppose that X has only two possible values, 0 and 1. Let p= Pr(X =1). Then E(X) = E(X?) =p 
and Var(X) = p—p. The largest possible value of p—p? occurs when p = 1/2, and the value is 1/4. So 
Var(X) < 1/4 if X only has the two possible values 0 and 1. For the remainder of the proof, we shall 
show that if X has any possible values strictly between 0 and 1, then there is another random variable 
Y taking only the values 0 and 1 and with Var(Y) > Var(X). So, assume that X takes at least one 
value strictly between 0 and 1. Without loss of generality, assume that one of those possible values is 
between 0 and yw. (Otherwise replace X by 1 — X which has the same variance.) Let = E(X), and 
let 21, 22,... be the values such that x; < w and f(2;) > 0. Define a new random variable 


ey | 0 PASM 
e -{% if X > pw. 
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The p.f. of X™* is 


f(x) for all x > p, 
Pria= ~, {(@;) torz =, 
0 otherwise. 


The mean of X* is u* = uw — 7, xi f(2;). The mean of X*? is E(X*) — 1; x? f(2;). So, the variance of 
X* is 


Var(X*) = E(X?) — Li F(2s) = c = Dasa) 
= Var(X)- 3 x? f(ai) + 2u 2 wif (ai) — bs Qi He) : (8.6.3) 
since 2; < 1 for each i, we have 
~ Lifes) + 24D eif(2%) > Dri F(a) (S.6.4) 
Let t =X; f(#i) > 0. Then 


oe Fifi tor eS 49,2. )s 


0 otherwise, 


is a p.f. Let Z be a random variable whose p.f. is g. Then 
1 
HZ) = 7Daiflea), 
1 
Var(Z) = : » a? f (a4). 


Since Var(Z) > 0 and t < 1, we have 


2 2 


Combine this with (5.6.3) and (8.6.4) to see that Var(X*) > Var(X). If f*(a) > 0 for some = strictly 
between 0 and 1, replace X* by 1 — X* and repeat the above process to produce the desired random 
variable Y. 


We need to prove that, for every € > 0, 
Jim Pr(lg(Zn) — 9)| <9) = 1. 


Let € > 0. Since g is continuous at 6, there exists 6 such that |z — b| < 6 implies that |g(z) — g(b)| < e. 
Also, since Z, zit b, we know that 


lim Pr(|Z, — 6| < 6) =1. 
Noo 
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16. 


17. 


18. 


Chapter 6. Large Random Samples 
But {|Z, — b| < 6} Cc {|9(Z,) — g(b)| < Ee}. So 
Pr(|g(Zn) — g(b)| < €) > Pr(|Z, — b| < 0) (S.6.5) 


Since the right side of (5.6.5) goes to 1 as n > oo so does the left side. 


The argument here is similar to that given in Exercise 15. Let € > 0. Since g is continuous at (b,c), 
there exists 5 such that \/(z — b)? + (y—c)? < 6 implies |g(z,y) — g(b,c)| < €. Also, |z — b| < 6/V2 
and |y — cl < 6/\/2 together imply /(z— 6)? + (y—c)? < 6. Let By = {|Z, — b| < 6//2} and 
Cr = {|Yn —¢| < 6/V2}. It follows that 


Bro Cn Cc {|9(Zn, Yn) = g(b, c)| = Of. (5.6.6) 


We can write 


Pr(By, A Cn) 1 — Pr([B, 9 C,]|°) = 1— Pr(BS UC’) > 1 — Pr(B®) — Pr(C*) 


Pr(B,,) + Pr(C;,) — 1. 


Combining this with (S.6.6), we get 
Pr(|g(Zn, Yn) — g(b, c)| < 6) > Pr(B,) + Pr(C,,) — 1. 


Since Z, “> b and Y;, 43 c, we know that both Pr(B,,) and Pr(C,,) go to 1 as n — oo. Hence 
Pr(|g(Zn, Yn) — g(b, c)| < 6) goes to 1 as well. 


(a) The mean of X is np, and the mean of Y is np/k. Since Z = kY, the mean of Z is knp/k = np. 


(b) The variance of X is np(1—p), and the variance of Y is n(p/k)(1—p/k). So, the variance Z = kY 
is k? times the variance of Y, i.e., 


Var(Z) = k?n(p/k)(1 — p/k) = knp(1 — p/k). 


If p is small, then both 1 — p and 1 — p/k will be close to 1, and Var(Z) is approximately knp 
while the variance of X is approximately np. 


(c) In Fig. 6.1, each bar has height equal to 0.01 times a binomial random variable with parameters 
100 and the probability that X 7 is in the interval under the bar. In Fig. 6.2, each bar has height 
equal to 0.02 times a binomial random variable with parameters 100 and probability that X, is 
in the interval under the bar. The bars in Fig. 6.2 have approximately one-half of the probability 
of the bars in Fig. 6.1, but their heights have been multiplied by 2. By part (b), we expect the 
heights in Fig. 6.2 to have approximately twice the variance of the heights in Fig. 6.1. 


The result is trivial if the m.g.f. is infinite for all s > 0. So, assume that the m.g-f. is finite for at least 
some s > 0. For every t and every s > 0 such that the m.g.f. is finite, we can write 


Pr(X > t) = Pr(exp(sX) > exp(st) < ate = (s) exp(—st), 


where the second equality follows from the Markov inequality. Since Pr(X > t) < v(s)exp(—st) for 
every s Pr(Y >t) < min, ¥(s) exp(—st). 
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19. (a) First, insert s from (6.2.15) into the expression in (6.2.14). We get 


l=p (l+u)jp+l—p l~p 
n |log(p + (=? +) log {SEPT Par — ph tog ee Kes ee al 
| (P) D up+1—p Crue t—? (1 — p) 


The last term can be rewritten as 
= 
—log {1 = | 
(l+u)p+l—p 
The result is then 


n |(— +u) log { -»)} +log{(l+u)p+1 -»}| ‘ 


= —log(p) + log {(1 + u)p+1—p}. 


This is easily recognized as n times the logarithm of (6.2.16). 


(b) For all u, q is given by (6.2.16). For u = 0, q = (1—p)“~?)/”. Since 0 < 1—p < land (1—p)/p > 0, 
we have 0 < q <1 when u=0. For general u, let x = p(1+u)+1-—p and rewrite 
(~p)(p+2) 


log(q) = log(p + x) + i 


pre 
p 
Since zx is a linear increasing function of u, if we show that log(q) is decreasing in x, then q is 
decreasing in u. The derivative of log(q) with respect to x is 
1 1— £ 
iP a Pigg P)(p+ x) 
t(p+ax)  p £ 

The first term is negative, and the second term is negative at u = 0 (x = 1). To be sure that the 
sum is always negative, examine the second term more closely. The derivative of the second term 
is 


1 ( 1 -) —1 
— {| —— — —]) = — — <0 
p\p+2 x) x(p+z) 
Hence, the derivative is always negative, and q is less than 1 for all wu. 


20. We already have the m.g.f. of Y in (6.2.9). We can multiply it by e~*"/10 and minimize over s > 0. 
Before minimizing, take the logarithm: 


log[eh(s)e79"/10] =n ow (1/2) + loglexp(s) + 1] — = : (S.6.7) 


The derivative of this logarithm is 


exp(s) =| 


exp(s)+1 5 


The derivative is 0 at s = log(3/2), and the second derivative is positive there, so s = log(3/2) provides 
the minimum. The minimum value of (8.6.7) is —0.02014, and the Chernoff bound is exp(—0.02014n) = 
(0.98)” for Pr(Y > n/10). Similarly, for Pr(—Y > n/10), we need to minimize 


log[b(—s)e78"/19] = n fow(1/2) + loglexp(—s) + 1] + = ; (S.6.8) 


The derivative is 


— exp(—s) =| . 


exp(—s)+1 5 
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which equals 0 at s = log(3/2). The minimum value of (S.6.8) is again —0.02014, and the Chernoff 
bound for the entire probability is 2(0.98)”, a bit smaller than in the example. 


21. (a) The m.g.f. of the exponential distribution with parameter 1 is 1/(1—s) for s < 1, hence the m.g-f. 
of Y,, is 1/(1 — s)” for t < 1. The Chernoff bound is the minimum (over s > 0) of e~"“*/(1 — s)”. 
The logarithm of this is —n[us + log(1— s)], which is minimized at s = (u—1)/u, which is positive 
if and only if uw > 1. The Chernoff bound is [wexp(1 — w)]”. 


(b) If wu < 1, then the expression in Theorem 6.2.7 is minimized over s > 0 near s = 0, which provides 
a useless bound of 1 for Pr(Y, > nu). 


22. (a) The numbers (k — 1)k/2 for k = 1,2,... form a strictly increasing sequence starting at 0. Hence, 
every integer n falls between a unique pair of these numbers. So, ky is the value of k such that n 
is larger than (k — 1)k/2 but no larger than k(k + 1)/2. 


(b) Clearly j, is the excess of n over the lower bound in part (a), hence j,, runs from 1 up to the 
difference between the bounds, which is easily seen to be ky. 


(c) The intervals where h,, equals 1 are defined to be disjoint for j, = 1,...,kn, and they cover the 
whole interval [0,1). Hence, for each x h,(x) = 1 for one and only one of these intervals, which 
correspond to n between the bounds in part (a). 


(d) For every x € [0,1), h,(x) = 1 for one n between the bounds in part (a). Since there are infintely 
many values of kp, hn(x) = 1 infintely often for every x € [0,1), and Pr(X € [0,1)) =1. 


(e) For every € > 0 |Z, — 0| > € whenever Z, = 1. Since Pr(Z, = 1 infinitely often) = 1, the 
probability is 1 that Z,, fails to converge to 0. Hence, the probability is 0 that Z, does converge 
to 0. 

(f) Notice that h(x) = 1 on an interval of length 1/k,. Hence, for each n, Pr(|Z, — 0| > €) = 1/kn, 
which goes to 0. So, Zp, 250; 


23. Each Z,, has the Bernoulli distribution with parameter 1/k,, hence E[(Z, — 0)?] = 1/kn, which goes 
to 0. 


24. (a) By construction, {Z,, converges to 0} = {X > 0}. Since Pr(X > 0) = 1, we have Z,, converges to 
0 with probability 1. 


(b) E[(Z, — 0)?] = E(Z?2) = n*/n, which does not go to 0. 


6.3. The Central Limit Theorem 


Commentary 


The delta method is introduced as a practical application of the central limit theorem. The examples of the 
delta method given in this section are designed to help pave the way for some approximate confidence interval 
calculations that arise in Sec. 8.5. The delta method also helps in calculating the approximate distributions 
of some summaries of simulations that arise in Sec. 12.2. This section ends with two theoretical topics that 
might be of interest only to the more mathematically inclined students. The first is a central limit theorem 
for random variables that don’t have identical distributions. The second is an outline of the proof of the i.i.d. 
central limit theorem that makes use of moment generating functions. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 6.3. The Central Limit Theorem 195 


Solutions to Exercises 


1. 


The length of rope produced in one hour X has a mean of 60 x 4 = 240 feet and a standard deviation 
of 60!/2 x 5 = 38.73 inches, which is 3.23 feet. The probability that X > 250 is approximately the 
probability that a normal random variable with mean 240 and standard deviation 3.23 is at least 250, 
namely 1 — 6((250 — 240]/3.23) = 1 — ®(3.1) = 0.001. 


. The total number of people X from the suburbs attending the concert can be regarded as the sum of 


1200 independent random variables, each of which has a Bernoulli distribution with parameter p = 1/4. 
Therefore, the distribution of X will be approximately a normal distribution with mean 1200(1/4) = 300 
and variance 1200(1/4)(3/4) = 225. If we let Z = (X — 300)/15, then the distribution of Z will be 
approximately a standard normal distribution. Hence, 


Pr(X < 270) = Pr(Z < —2) ~ 1— (2) = 0.0227. 


. Since the variance of a Poisson distribution is equal to the mean, the number of defects on any bolt 


has mean 5 and variance 5. Therefore, the distribution of the average number X,, on the 125 bolts 
will be approximately the normal distribution with mean 5 and variance 5/125 = 1/25. If we let 
Z = (Xp —5)/(1/5), then the distribution of Z will be approximately a standard normal distribution. 
Hence, 


Pr(Xp < 5.5) = Pr(Z < 2.5) © 8(2.5) = 0.9938. 


. The distribution of Z = /n(Xy—1)/3 will be approximately the standard normal distribution. There- 


fore, 
Pr(| Xn —pw| < 0.3) = Pr(| Z| <0.1/n) ~ 26(0.1/n) — 1. 


But 2®(0.1,/n)—1 > 0.95 if and only if 6(0.1,/n) > (1+0.95)/2 = 0.975, and this inequality is satisfied 
if and only if 0.1,/n > 1.96 or, equivalently, n > 384.16. Hence, the smallest possible value of n is 385. 


. The distribution of the proportion X,, of defective items in the sample will be approximately the 


normal distribution with mean 0.1 and variance (0.1)(0.9)/n = 0.09/n. Therefore, the distribution of 
Z = /n(Xy — 0.1)/0.3 will be approximately the standard normal distribution. It follows that 


Pr(Xn < 0.13) = Pr(Z < 0.1V/n) ~ (0.1/7). 


For this value to be at least 0.99, we must have 0.1,/n > 2.327 or, equivalently, n > 541.5. Hence, the 
smallest possible value of n is 542. 


. The distribution of the total number of times X that the target is hit will be approximately the nor- 


mal distribution with mean 10(0.3) + 15(0.2) + 20(0.1) = 8 and variance 10(0.3)(0.7) + 15(0.2)(0.8) + 
20(0.1)(0.9) = 6.3. Therefore, the distribution of Z = (X — 8)/V/6.3 = (X — 8)/2.51 will be approxi- 
mately a standard normal distribution. It follows that 


Pr(X > 12) = Pr(Z > 1.5936) ~ 1 — 6(1.5936) = 0.0555. 


. The mean of a random digit X is 


1 


eo mee 


(OF14:.-+9) =4.5. 
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Also, 


1 1 (9)(10)(19) 
E(X?) = (0? +2 +---+99) = - So = 28.5. 
oo i0 | a ea 10 6 
Therefore, Var(X) = 28.5 — (4.5)? = 8.25. The distribution of the average X, of 16 random digits 
will therefore be approximately the normal distribution with mean 4.5 and variance 8.25/16 = 0.5156. 
Hence, the distribution of 


Xn—-4.5  X_,—45 
JY0.5156  ~=0.7181 


will be approximately a standard normal distribution. It follows that 


Pr(4 < Xp, <6) Pr(—0.6963 < Z < 2.0888) 
= (2.0888) — [1 — 6(0.6963)] 


0.9816 — 0.2431 = 0.7385. 


8. The distribution of the total amount X of 36 drinks will be approximately the normal distribution with 
mean 36(2) = 72 and variance 36(1/4) = 9. Therefore, the distribution of Z = (X — 72)/3 will be 
approximately a standard normal distribution. It follows that 


Pr(X < 63) = Pr(Z < —3) = 1— 6(3) = 0.0013. 


9. (a) By Eq. (6.2.4), 


— o o 16 16 
P| eet Se eS 
r( n— Hl ae no 25 
= o 16 
Therefore, Pr IXn - HI SZ 215. = 0.96. 


(b) The distribution of 
Xam pds 


af//n = GlAn 2H) 


will be approximately a standard normal distribution. Therefore, 


- 5 
Pr (I, —pl< *) =Pr (iz! < >) ~ 28(1.25) — 1 = 0.7887. 


10. (a) As in part (a) of Exercise 9, 
— os 16 
Pr(|X,-pl<—)>1-—. 
1((Fn-wi< 2) s1-— 
Now 1 — 16/n > 0.99 if and only if n > 1600. 
(b) As in part (b) of Exercise 9, 


Pr ([Xn~ a < *) =Pr (\2 < ) = 20 (4) ae 


Now 26 (\/n/4) — 1 > 0.99 if and only if ® (,/n/4) > 0.995. This inequality will be satisfied if and 
only if \/n/4 > 2.567 or, equivalently, n > 105.4. Therefore, the smallest possible sample size is 
106. 
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For a student chosen at random, the number of parents X who will attend the graduation ceremony has 
mean p = 0/3+1/3+2/3 = 1 and variance o? = E[(X — y)*] = (0— 1)?/34 (1 —1)?/3 + (2—1)?/3 = 
2/3. Therefore, the distribution of the total number of parents W who attend the ceremony will 
be approximately the normal distribution with mean (600)(1) = 600 and variance 600(2/3) = 400. 
Therefore, the distribution of Z = (W — 600)(20) will be approximately a standard normal distribution. 
It follows that 


Pr(W < 650) = Pr(Z < 2.5) = 8(2.5) = 0.9938. 


The m.g.f. of the binomial distribution with parameters n and py, is W(t) = (pn exp(t) +1— pp)”. If 
npn > r, 


jim, a(t) = lim, (1+ fexp(e) - 11)", 


This converges to exp(A[e’ — 1]), which is the m.g.f. of the Poisson distribution with mean 4. 


We are asking for the asymptotic distribution of g(X;,), where g(x) = x°. The distribution of X,, is 
normal with mean @ and variance o?/n. According to the delta method, the asymptotic distribution of 
g(Xn) should be the normal distribution with mean g(@) = 6° and variance (a?/n)(g'(0)]? = 96407/n. 


First, note that Y,, = 7?_, X?/n has asymptotically the normal distribution with mean o? and variance 
204/n. Here, we have used the fact that E(X?) = 0? and E(X}) = 2c7%. 


(a) Let g(x) = 1/z. Then g/(x) = —1/x?. So, the asymptotic distribution of g(Y;,) is the normal 
distribution with mean 1/0? and variance (204/n)/o® = 2/[no4}. 


(b) Let h(j) = 2mu?. If the asymptotic mean of Y,, is the asymptotic variance of Y;, is h(j)/n. So, 
a variance stabilizing transformation is 
Me dz 1 
a(y) = | DPg arp los(x), 


where we have taken a = 1 to make the integral finite. So the asymptotic distribution of 
log(Y;)/2!/? is the normal distribution with mean 2log(a)/2!/? and variance 1/n. 


— 
& 
" 


Clearly, Y, < y if and only if X; < y fori =1,...,n. Hence, 


(y/0)"if O<y <8, 
Pry, = 9) =] Prag = yy =< 0 ify <0, 
1 ify > 0. 


(b) The c.d-f. of Z, is, for z < 0, 
Pr(Z, < z) = Pr(¥n < 04 2/n) = (14+ z/[n6])”. (S.6.9) 


Since Z, < 0, the c.d-f. is 1 for z > 0. According to Theorem 5.3.3, the expression in (S.6.9) 
converges to exp(z/6). 


— 4/2 / = A : Boge aad : : 
= . = . no dat. 
(c) Let a(y) = y*. Then a/(y) = 2y. We have n(Y,, — 9) converging in distribution to the c.d.f. in 
part (b). The delta method says that, for 6 > 0, n(Y,2 — 67)/[20] converges in distribution to the 
same c.d.f. 
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6.4 The Correction for Continuity 


Solutions to Exercises 


1. The mean of X; is 1 and the mean of X? is 1.5. So, the variance of X; is 0.5. The central limit theorem 


says that Y = X, + ---+ X39 has approximately the normal distribution with mean 30 and variance 
15. We want the probability that Y < 33. Using the correction for continuity, we would assume that Y 
has the normal distribution with mean 30 and variance 15 and compute the probability that Y < 33.5. 
This is ®((33.5 — 30]/15!/2) = ©(0.904) = 0.8169. 


(a) E(X) = 15(.3) = 4.5 and ox = [(15)(.3)(.7)|!/2 = 1.775. Therefore, 
3.5 — 4.5 


1.775 
= Pr(—.5634 < Z <0) & ©(.5634) — 5 & 214. 


aS 
a 
~< 
l| 
a 
l| 


Pr(3.5.< X < 45) = Pr <Z<0) 


(b) The exact value is found from the table of binomial probabilities (n=15, p = 0.3,k = 4) to be 
.2186. 


. In the notation of Example 2, 


495.5 — 450 
pp in le 


Pr(H > 495) = Pr(H > 495.5) = Pr (z z 


) = 1 — (3.033) = .0012. 


. We follow the notation of the solution to Exercise 2 of Sec. 6.3: 


269.5 — 300 


Pr(X < 270) = Pr(X < 269.5) = Pr (z ae 


) = 1 — &(2.033) ~ .0210. 


. Let X denote the total number of defects in the sample. Then X has a Poisson distribution with mean 


5(125) = 625, so ox is (625)!/2 = 25. Hence, 
Pri Xy < 5.5) = Prix =< 125(5.5)| = Pri X < 687.5). 


Since this final probability is just the value that would be used with the correction for continuity, the 
probability to be found here is the same as that originally found in Exercise 3 of Sec. 6.3. 


. We follow the notation of the solution to Exercise 6 of Sec. 6.3: 


11.5—-8 


Pr(X > 12) =Pr(X > 11.5) =Pr (z > 


) mw 1 — &(1.394) & .082. 


. Let S denote the sum of the 16 digits. Then 


E(S) = 16(4.5) =72 and ox = [16(8.25)]!/? = 11.49. 


Hence, 


ac) 
ese 
ih 
IA 
3 
IA 
S&S 

lI 


Pr(64 < S' < 96) = Pr(63.5 < S < 96.5) 
p, (883-7 © 7 < 95-72 
11.49 ~~~" 11.49 


®(2.132) — ®(—.740) & .9835 — .2296 = .7539. 


2 
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6.5 Supplementary Exercises 


Solutions to Exercises 


1. By the central limit theorem, the distribution of X is approximately normal with mean (120)(1/6) = 20 
and standard deviation [120(1/6)(5/6)|!/? = 4.082. Let Z = (X — 20)/4.082. Then from the table of 
the standard normal distribution we find that Pr(|Z| < 1.96) = .95. Hence, & = (1.96)(4.082) = 8.00. 


2. Because of the property of the Poisson distribution described in Theorem 5.4.4, the random variable X 
can be thought of as the sum of a large number of i.i.d. random variables, each of which has a Poisson 
distribution. Hence, the central limit theorem (Lindeberg and Lévy) implies the desired result. It can 
also be shown that the m.g.f. of X converges to the m.g.f. of the standard normal distribution. 


3. By the previous exercise, X has approximately a normal distribution with mean 10 and standard 
deviation (10)!/2 = 3.162. Thus, without the correction for continuity, 
8-10 12-10 


<a < 
3.162 ~ ~ 3.162 


Pr(8 < X < 12) =Pr ( ) ~ ©(.6325) — ©(—.6325) = .473. 


With the correction for continuity, we find 


2.9 ae 2.5 


Piys. <x < 19S) = Pel Se 
ee) r( 3.162 ~~ — 3.162 


) ~ ®(.7906) — 6(—.7906) = .571. 
The exact probability is found from the Poisson table to be 
(.1126) + (.1251) + (.1251) + (.1137) + (.0948) = .571. 
Thus, the approximation with the correction for continuity is almost perfect. 
4. If X has p.d.f. f(x), then 
E(X*) = i. a* f (x)dx > [ ak f (x)dx > ef f(x)de = t* Pr(X >t). 


A similar proof holds if X has a discrete distribution. 


5. The central limit theorem says that X,, has approximately the normal distribution with mean p and 
variance p(1—p)/n. A variance stabilizing transformation will be 


a(z) = fp — pap, 


To perform this integral, transform to z = p'/?, that is, p= z?. Then 


gi/2 dz 
(0) = fi Gar 


Next, transform so that z = sin(w) or w = arcsin(z). Then dz = cos(w)dw and 
arcsin «1/2 
as) = | dw = arcsina/?, 
al 


resin a1/2 


where we have chosen a = 0. The variance stabilizing transformation is a(x) = arcsin(«!/?), 
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. According to the central limit theorem, X,, has approximately the normal distribution with mean @ 


and variance 6?. A variance stabilizing transformation will be 


ala) = [ 6~d@ = log(x), 


where we have used a = 1. 


. Let F;, be the c.d.f. of X,;. The most direct proof is to show that lim. F(x) = F(a) for every point at 


which F' is continuous. Since F is the c.d.f. of an integer-valued distribution, the continuity points are 
all non-integer values of x together with those integer values of x to which F' assigns probability 0. It is 
clear, that it suffices to prove that Jim, F(x) = F(a) for every non-integer x, because continuity of F 
from the right and the fact that F’ is nondecreasing will take care of the integers with zero probability. 
For each non-integer x, let mz, be the largest integer such that m < x. Then 


where the convergence follows because the sums are finite. 


. We know that Pr(X,, =m) = (*)\pr(1 —pn)*-™ form =0,...,k and all n. We also know that 


: k m k-m _ k m k-—m 
Jim (1) na — = ("Jo (l—pyr™, 


for all m. By Exercise 7, X, converges in distribution to the binomial distribution with parameters k 
and p. 


. Let X1,...,Xig be the times required to serve the 16 customers. The parameter of the exponenital 


distribution is 1/3. According to Theorem 5.7.8, the mean and variance of each X; are 3 and 9 
respectively. Let pan X, = Y be the total time. The central limit theorem approximation to the 
distribution of Y is the normal distribution with mean 16 x 3 = 48 and variance 16 x 9 = 144. The 
approximate probablity that Y > 60 is 


i (Gare) = 1~-— (1) = 0.1587. 


The actual distribution of Y is the gamma distribution with parameters 16 and 1/3. Using the gamma 
c.d.f., the probability is 0.1565. 


The number of defects in 2000 square-feet has the Poisson distribution with mean 2000 x 0.01 = 20. 
The central limit theorem approximation is the normal distribution with mean 20 and variance 20. 
Without correction for continuity, the approximate probability of at least 15 defects is 


if 
(Sa) = 1 — 6(—1.1180) = 0.8682. 


With the continuity correction, we get 


14.5 — 20 
1— & | ———— } = 1 — ®(—1.2298) = 0.8906. 


The actual Poisson probability is 0.8951. 
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The gamma distribution with parameters n and 3 is the distribution of the sum of n i.i.d. expo- 
nential random variables with parameter 3. If n is large, the central limit theorem should apply 
to approximate the distribution of the sum of n exponentials. 


The mean and variance of each exponential random variable are 1/3 and 1/9 respectively. The 
distribution of the sum of n of these has approximately the normal distribution with mean n/3 
and variance n/9. 


The exponential distribution with parameters n and 0.2 is the distribution of the sum of n 1.i.d. 
geometric random variables with parameter 0.2. If n is large, the central limit theorem should 
apply to approximate the distribution of the sum of n geometrics. 


The mean and variance of each geometric random variable are 0.8/0.2 = 4 and 0.8/(0.2)? = 20. 
The distribution of the sum of n of these has approximately the normal distribution with mean 
4n and variance 20n. 
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Chapter 7 


Estimation 


7.1 Statistical Inference 


Commentary 


Many students find statistical inference much more difficult to comprehend than elementary probability 
theory. For this reason, many examples of statistical inference problems have been introduced in the early 
chapters of this text. This will give instructors the opportunity to point back to relatively easy-to-understand 
examples that the students have already learned as a preview of what is to come. In addition to the examples 
mentioned in Sec. 7.1, some additional examples are Examples 2.3.3-2.3.5, 3.6.9, 3.7.14, 3.7.18, 4.8.9-4.8.10, 
and 5.8.1—5.8.2. In addition, the discussion of M.S.E. and M.A.E. in Sec. 4.5 and the discussion of the variance 
of the sample mean in Sec. 6.2 contain inferential ideas. Most of these are examples of Bayesian inference 
because the most common part of a Bayesian inference is the calculation of a conditional distribution or a 
conditional mean. 


Solutions to Exercises 


1. 


The random variables of interest are the observables X1,X2,... and the hypothetically observable 
(parameter) P. The X;’s are i.i.d. Bernoulli with parameter p given P = p. 


. The statistical inferences mentioned in Example 7.1.3 are computing the conditional distribution of P 


given observed data, computing the conditional mean of P given the data, and computing the M.S.E. 
of predictions of P both before and after observing data. 


. The random variables of interest are the observables Z1, Z2,..., the times at which successive particles 


hit the target, and 3, the hypothetically observable (parameter) rate of the Poisson process. The hit 
times occur occording to a Poisson process with rate 6 conditional on 8. Other random variables of 
interest are the observable inter-arrival times Y; = 2, and Y; = Z, — Z,_ 1 for k > 2. 


. The random variables of interest are the observable heights X7,...,X,, the hypothetically observable 


mean (parameter) jz, and the sample mean X,,. The X;’s are modeled as normal random variables with 
mean p and variance 9 given LU. 


. The statement that the interval (X, — 0.98, X, + 0.98) has probability 0.95 of containing py is an 


inference. 


. The random variables of interest are the observable number X of Mexican-American grand jurors and 


the hypothetically observable (parameter) P. The conditional distribution of X given P = p is the 
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binomial distribution with parameters 220 and p. Also, P has the beta distribution with parameters a 
and 8, which have not yet been specified. 


7. The random variables of interest are Y, the hypothetically observable number of oocysts in t liters, the 
hypothetically observable indicators X 1, X2,... of whether each oocyst is counted, X the observable 
count of oocysts, the probability (parameter) p of each oocyst being counted, and the (parameter) \ 
the rate of oocysts per liter. We model Y as a Poisson random variable with mean ti given A. We 
model Xj,...,X, as ii.d. Bernoulli random variables with parameter p given p and given Y = y. We 
define X = X, +...+ Xy. 


7.2 Prior and Posterior Distributions 


Commentary 


This section introduces some common terminology that is used in Bayesian inference. The concepts should all 
be familiar already to the students under other names. The prior distribution is just a marginal distribution 
while the posterior distribution is just a conditional distribution. The likelihood function might seem strange 
since it is a conditional density for the data given # but thought of as a function of @ after the data have 
been observed. 


Solutions to Exercises 


1. We still have y = 16178, the sum of the five observed values. The posterior distribution of 6 is now the 
gamma distribution with parameters 6 and 21178. So, 


fay = [ ** 7.518 x 10°34 exp(—211788)6 exp(—268)d8 


7.518 x 10% f B® exp(—6[21178 + x6])dB 


— 7518 x 1023 [(7) _ 5.413 x 1076 
= : (21178 +26)" _ (21178 + 06)" 


for xg > 0. We can now compute Pr(X¢ > 3000|a) as 


co 5.413107 — 5.413 x 1076 


ee ee 
te ee OATES 


Pr(X¢6 > 3000 =F ———_—"_} 
r( @ |a) 3000 (21178 + 26)" 


2. The joint p.f. of the eight observations is given by Eq. (7.2.11). Since n = 8 and y = 2 in this exercise, 
flaw | 8) = 67(1 — 69°. 
Therefore, 


€(0.1) f(x | 0-1) 
€(0.1) fn(w | 0.1) + €(0.2) fr(w | 0.2) 
(0.7)(0.1)?(0.9)® 
(0.7) (0.1)2(0.9)® + (0.3)(0.2)2(0.8)6 
= 0.5418. 


€(0.1|a) =Pr(0=0.1|a) = 


It follows that €(0.2 | 2) = 1— €(0.1 | x) = 0.4582. 
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3. Let X denote the number of defects on the selected roll of tape. Then for any given value of A, the p-f. 
of X is. 


exp(—A)A” 
x! 


f(e| A= for, = 0), 1,.2)..ce- 


Therefore, 


€(1.0) f(3 | 1.0) 
€(1.0)f (3 | 1.0) + €(1.5) f(3 | 1.5) 


From the table of the Poisson distribution in the back of the book it is found that 


€(1.0| X =3) =Pr(A=1.0| X =3) = 


f(3| 1.0) = 0.0613 and f(3| 1.5) = 0.1255. 
Therefore, €(1.0 | X = 3) = 0.2456 and €(1.5 | X = 3) =1-— (1.0 | X = 3) = 0.7544. 


4. If a and 6 denote the parameters of the gamma distribution, then we must have 


Therefore, a = 20 and 6 = 2. Hence, the prior p.d.f. of @ is as follows, for @ > 0: 


920 


T(20) 6%exp(—26). 


c= 


5. If a and 6 denote the parameters of the beta distribution, then we must have 


a 1 and ap _ 2 
a+B 3 (a+ 6)2(a+8+1) 90 
Since —*— = =, it follows that é = = Therefore, 
orp 3 (a+ 8) 3 
aB B 12 2 
(a+B)? at+B atB 33 9 


2 2 
It now follows from the second equation that atB+l) = ay and, hence, that a+ @+1 = 10. 
Therefore, a + 6 = 9 and it follows from the first equation that a = 3 and 6 = 6. Hence, the prior 


p.d.f. of 6 is as follows, for 0 <6 < 1: 
= — __ F(1 — 6)". 


6. The conditions of this exercise are precisely the conditions of Example 7.2.7 with n = 8 and y = 3. 
Therefore, the posterior distribution of @ is a beta distribution with parameters a = 4 and 3 = 6. 
7. Since f,(a | 0) is given by Eq. (7.2.11) with n=8 and y=3, then 
f(a | 0)&(8) = 26°(1 — 8)°. 


When we compare this expression with Eq. (5.8.3), we see that it has the same form as the p.d.f. of a 
beta distribution with parameters a = 4 and 6 = 7. Therefore, this beta distribution is the posterior 
distribution of 6. 
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Chapter 7. Estimation 
By Eq. (7.2.14), 
€(8 | x1) x f(x1 | A)E (9), 
and by Eq. (7.2.15), 
E(9 | 21,22) x f(x2 | A)§(O | a1). 
Hence, 
€(9 | v1, 22) x f(x1 | 0) f (a2 | )E(8). 
By continuing in this way, we find that 
(0 | x1, 22,03) o f(x3 | A)E(O | 21,22) x f(a | O)f (x2 | O)F (ws | O)E(A). 
Ultimately, we will find that 
€(9 | v1,.--,0n) « f(t1|6)... flan | AE (4). 
From Eq. (7.2.4) it follows that, in vector notation, this relation can be written as 
E(9 | x) x fn(a | 8)€(9), 
which is precisely the relation (7.2.10). Hence, when the appropriate factor is introduced on the right 


side of this relation so that the proportionality symbol can be replaced by an equality, €(@ | x) will be 
equal to the expression given in Eq. (7.2.7). 


. It follows from Exercise 8 that if the experiment yields a total of three defectives and five nondefectives, 


the posterior distribution will be the same regardless of whether the eight items were selected in one 
batch or one at a time in accordance with some stopping rule. Therefore, the posterior distribution in 
this exercise will be the same beta distribution as that obtained in Exercise 6. 


In this exercise 


1 1 
1 ford-=<a<04+e, 
f (al) = 2 2 


0 otherwise, 


and 


1 
— for 10<6< 20, 
e(@)=4 10 


0 otherwise. 


The condition that 9 — 1/2 < x < 6+ 1/2 is the same as the condition that « — 1/2 < 0 < «+1/2. 
Therefore, f(x | @)&(@) will be positive only for values of 6 which satisfy both the requirement that 
x—1/2 <6<«#+1/2 and the requirement that 10 < 6 < 20. Since X = 12 in this exercise, f(x | 0)E(@) 
is positive only for 11.5 < @ < 12.5. Furthermore, since f(x | 6)€(@) is constant over this interval, the 
posterior p.d.f. €(@ | x) will also be constant over this interval. In other words, the posterior distribution 
of ? must be a uniform distribution on this interval. 
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11. Let y; denote the smallest and let yg denote the largest of the six observations. Then the joint p.d-f. 
of the six observations is 


2 


1 1 

1 for @-—= O+ = 

fa(a | 0) = or a es ae 
0 otherwise. 


1 1 
The condition that 6 — 5 <u <y6 <O+ 5 is the same as the condition that yg — 1/2 < 6 < y, +1/2. 


Since €(0) is again as given in Exercise 10, it follows that f,,(a | 0)€(@) will be positive only for values of 
@ which satisfy both the requirement that 10 < 0 < 20. Since y; = 10.9 and yg = 11.7 in this exercise, 
fn(x|@)E(@) is positive only for yg — 1/2 < 0 < y, + 1/2 and the requirement that 10 < 6 < 20. Since 
yi = 10.9 and yg = 11.7 in this exercise, f,,(a | 6)€(@) is positive only for 11.2 < @ < 11.4. Furthermore, 
since f,(a | 0)€(@) is constant over this interval, the posterior p.d.f. €(@ | a) will also be constant over 
the interval. In other words, the posterior distribution of 6 must be a uniform distribution on this 
interval. 


7.3 Conjugate Prior Distributions 


Commentary 


This section introduces some convenient prior distributions that make Bayesian inferences mathematically 
tractable. The instructor can remind the student that numerical methods are available for performing 
Bayesian inferences even when other prior distributions are used. Mathematical tractability is useful when 
introducing a new concept so that attention can focus on the meaning and interpretation of the new concept 
rather than the numerical methods required to perform the calculations. Although conjugate priors for the 
parameter of the uniform distribution are not discussed in the body of the section, Exercises ‘17 and 18 
illustrate how the general concept extends to these distributions. 


Solutions to Exercises 


1. The posterior mean of @ will be 


100 x 0 + 20v? x 0.125 
qi i _—_ — = 0.12. 

100 + 20v? 
We can solve this equation for v? by multiplying both sides by 100 + 20v? and collecting terms. The 
result is v? = 120. 


2. If we let y = (y+ )(y+2+2), then 1-—y = (2 +1) (yt+2+2) and V =4(1-7)/(y+2+3). The 
maximum value of 7(1 — 7) is 1/4, and is attained when y = 1/2. Therefore, V < 1/[4(y + 2+ 3)]. It 
now follows that if 1//4(y + z+ 3)] < 0.01, then V < 0.01. But the first inequality will be satisfied if 
y+z2 > 22. Since y+ z is the total number of items that have been selected, it follows that this number 
need not exceed 22. 


3. Since the observed number of defective items is 3 and the observed number of nondefective items is 97, 
it follows from Theorem 7.3.1 that the posterior distribution of @ is a beta distribution with parameters 


2+3=5 and 200 + 97 = 297. 
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4. Let a; and 3; denote the parameters of the posterior beta distribution, and let y = a;/(a, +1). Then 
y is the mean of the posterior distribution and we are told that y = 2/51. The variance of the posterior 
distribution is 


a) a 
(ay + 81)?(a1 + 61 + 1) ori ote orth +1 
= 
2 49 1 98 1 


From the value of this variance given in the exercise it is now evident that a, + 6; +1 = 103. Hence, 
ay + 6, = 102 and a, = y(a, + 81) = 2(102)/51 = 4. In turn, it follows that 6, = 102 — 4 = 98. 
Since the posterior distribution is a beta distribution, it follows from Theorem 7.3.1 that the prior 
distribution must have been a beta distribution with parameters a and (6 such that a+ 3 = a, and 
8+97 = ~,. Therefore, a = @ = 1. But the beta distribution for which a = 6 = 1 is the uniform 
distribution on the interval [0,1]. 


5. By Theorem 7.3.2, the posterior distribution will be the gamma distribution for which the parameters 
nm 
are 3+ 5° 2; =3+13= 16 andl +n=1+5=6. 
i=l 
6. The number of defects on a 1200-foot roll of tape has the same distribution as the total number of 
defects on twelve 100-foot rolls, and it is assumed that the number of defects on a 100-foot roll has 


the Poisson distribution with mean 6. By Theorem 7.3.2, the posterior distribution of @ is the gamma 
distribution for which the parameters are 2+ 4 = 6 and 10+ 12 = 22. 


7. In the notation of Theorem 7.3.3, we have o? = 4, « = 68, v? = 1, n= 10, and Z, = 69.5. Therefore, 
the posterior distribution of @ is the normal distribution with mean jz; = 967/14 and variance v7 = 2/7. 


8. Since the p.d.f. of a normal distribution attains its maximum value at the mean of the distribution 
and then drops off on each side of the mean, among all intervals of length 1 unit, the interval that is 
centered at the mean will contain the most probability. Therefore, the answer in part (a) is the interval 
centered at the mean of the prior distribution of # and the answer in part (b) is the interval centered at 
the mean of the posterior distribution of 0. In part (c), if the distribution of 6 is specified by its prior 
distribution, then Z = @ — 68 will have a standard normal distribution. Therefore, 


Pr(67.5 < 6 < 68.5) = Pr(—0.5 < Z < 0.5) = 26(0.5) = 1 = 0.3830. 


Similarly, if the distribution of @ is specified by its posterior distribution, then Z = (0 — y44)/v, = 
(0 — 69.07) /0.5345 will have a standard normal distribution. Therefore, 


Pr(68.57 < 6 < 69.57| 0) = Pr(—0.9355 < Z < 0.9355) 
2@(0.9355) — 1 = 0.6506. 


9. Since the posterior distribution of 0 is normal, the prior distribution of @ must also have been normal. 
Furthermore, from Eqs. (7.3.1) and (7.3.2), we obtain the relations: 


ga kt (20)(10)v 
14200? 
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and 


2 


25 1+ 20v?" 
It follows that v? = 1/5 and p = 0. 


In this exercise, 0? = 4 and v? = 1. Therefore, by Eq. (7.3.2) 


4t+n 
It follows that v7 < 0.01 if and only if n > 396. 


In this exercise, c? = 4 and n = 100. Therefore, by Eq. (7.3.2), 


Av? 1 if 
oe 
4+100v2 254 (1/v?) ~ 25 


Since the variance of the posterior distribution is less than 1/25, the standard deviation must be less 
than 1/5. 


Let a and § denote the parameters of the prior gamma distribution of 0. Then a/6 = 0.2 and 
a/@8? =1. Therefore, 8 = 0.2 and a = 0.04. Furthermore, the total time required to serve the sample 
of 20 customers is y = 20(3.8) = 76. Therefore, by Theorem 7.3.4, the posterior distribution of @ is the 
gamma distribution for which the parameters are 0.04 + 20 = 20.04 and 0.2 + 76 = 76.2. 


The mean of the gamma distribution with parameters a and £ is a/( and the standard deviation is 
a'/?/8. Therefore, the coefficient of variation is a~!/?. Since the coefficient of variation of the prior 
gamma distribution of 6 is 2, it follows that a = 1/4 in the prior distribution. Furthermore, it now 
follows from Theorem 7.3.4 that the coefficient of variation of the posterior gamma distribution of @ is 
(a+n)7!/2 = (n+1/4)~/?. This value will be less than 0.1 if and only if n > 99.75.Thus, the required 
sample size is n > 100. 


Consider a single observation X from a negative binomial distribution with parameters r and p, where 
the value of r is known and the value of p is unknown. Then the p.f. of X has the form f(x | p) « p’q’. 
If the prior distribution of p is the beta distribution with parameters a and /, then the prior p.d.-f. €(p) 
has the form €(p) « p?~!q?-!. Therefore, the posterior p.d.f. €(p | x) has the form 


E(p | x) « Ep) f(p | x) x pet? *gF et. 


This expression can be recognized as being, except for a constant factor, the p.d.f. of the beta distri- 
bution with parameters a+r and 6+. Since this distribution will be the prior distribution of p for 
future observations, it follows that the posterior distribution after any number of observations will also 
be a beta distribution. 


(a) Let y= 1/0. Then 0 = 1/y and dé = —dy/y?. Hence, 


I £(0)d0 = I faye exp(—6y)dy = 1. 
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(b) If an observation X has a normal distribution with a known value of the mean y and an unknown 
value of the variance 6, then the p.d.f. of X has the form 


fa | 8) « . 


FT 
1/2 7” ( — p) 
0°/* exp 39 | 


Also, the prior p.d.f. of 6 has the form 
E(8) oc OY exp(—8/8). 
Therefore, the posterior p.d.f. €(@ | 2) has the form 


E(B | 2)  €(8) fle | 8) 6-9 exp { [3 + He - n)?] 5h, 


Hence, the posterior p.d.f. of 6 has the same form as €(0) with a@ replaced by a+ 1/2 and 6 
replaced by 8 + 1/2(x — y1)?. Since this distribution will be the prior distribution of 6 for future 
observations, it follows that the posterior distribution after any number of observations will also 
belong to the same family of distributions. 


16. If X has the normal distribution with a known value of the mean and an unknown value of the 
standard deviation o, then the p.d.f. of X has the form 


(x - | . 


Qo? 


fle | ole < exp - 


Therefore, if the prior p.d.f. €(a7) has the form 
E(o) «x 0“ exp(—b/o”), 


then the posterior p.d.f. of o will also have the same form, with a replaced by a+ 1 and b replaced by 
b+ (x — p)?/2. It remains to determine the precise form of €(c). If we let y = 1/o?, then o = y~\/? 
and do = —dy/(2y?/?). Therefore, 


co il: [oe] 
| o “exp(—b/a?)do = s/f y(2-9)/? exp(—by)dy. 
0 0 


The integral will be finite if a > 1 and b > 0, and its value will be 
r 5 (a _ | 
p(e-1)/2 : 
Hence, for a > 1 and } > 0, the following function will be a p.d.f. for ao > 0: 
2p(¢-))/2 
1 

r 5(a 2 | 
Finally, we can obtain a more standard form for this p.d.f. by replacing a and b by a = (a — 1)/2 and 
B=b. Then 


ee) = a“ exp(—b/o”). 


&(a) = moe exp(—8/a") for a > 0. 


The family of distributions for which the p.d.f. has this form, for all values of a > 0 and ( > 0, will be 
a conjugate family of prior distributions. 
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The joint p.d.f. of the three observations is 


_ J fe? cord <a <0 GS 1,2,3), 
F(w1,@2, 03 | @) = 0 otherwise. 


Therefore, the posterior p.d.f. €(@ | 71, 22,273) will be positive only if @ > 4, as required by the prior 
p.d.f., and also 0 > 8, the largest of the three observed values. Hence, for 6 > 8, 


£(0 | ©, 09504) x E(O) f (x1, £2, £3 | 9) x 1/0". 


Since 


coe 1 
ie 
gre? = Gaye 


it follows that 


_ | 6(8°)/8" for@>8 
c(| 21,2305) = | 0 for <8. 


Suppose that the prior distribution of @ is the Pareto distribution with parameters xp and a (xo > 0 
and a > 0). Then the prior p.d.f. €(@) has the form 


&(0) «1/0! ~— for A> 20. 

If X1,...,X, form a random sample from a uniform distribution on the interval [0,6], then 
iale |) cise for mang oj, 3225 iy} 

Hence, the posterior p.d.f. of @ has the form 
E( | aw) x E(9) fn(aw | A) « 1/aetrr’, 


for 9 > max{xo,21,...,%n}, and €(@ | a) = 0 for 6 < max{2z,71,...,2,}. This posterior p.d.f. can 
now be recognized as also being the Pareto distribution with parameters a+n and max{2,71,...,2n}. 


Commentary: Exercise 17 provides a numerical illustration of the general result presented in Exercise 18. 


The joint p.d.f. of X1,...,X, has the following form, for 0 < 7; < 1(i=1,...,n): 


n é-1 7 ) 
g” (11 «| ax @” (11 «| 
i=1 i=l 


0” exp Qs log n)) 


i=1 


fn(a@ | 8) 


The prior p.d.f. of 0 has the form 


£(0) x 9°14 exp(—/0). 
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Hence, the posterior p.d.f. of 0 has the form 
E(B | 2) o €(0) fa(w | 0) oc OF"? exp |- (5 — SJ log «) |. 
i=l 


This expression can be recognized as being, except for a constant factor, the p.d.f. of the gamma 
distribution with parameters a; = a+n and 3, = 8—)7i_, log z;. Therefore, the mean of the posterior 
distribution is a,/6, and the variance is a; /6?. 


20. The mean lifetime conditional on ( is 1/6. The mean lifetime is then the mean of 1/3. Prior to 
observing the data, the distribution of 6 is the gamma distribution with parameters a and 0, so the 
mean of 1/6 is b/(a—1) according to Exercise 21 in Sec. 5.7. After observing the data, the distribution 
of is the gamma distribution with parameters a+ 10 and 6+ 60, so the mean of 1/6 is (b+ 60)/(a+9). 
So, we must solve the following equations: 


b 
a-—1l 
b+ 60 
a+9 


= 4 


These equations convert easily to the equations b = 4a — 4 and b = 5a— 15. Soa = 11 and b = 40. 


nm 
21. The posterior p.d.f. is proportional to the likelihood 6” exp (-09>«] times 1/0. This product can 
i=1 
be written as 0”! exp(—Onz,). As a function of @ this is recognizable as the p.d.f. of the gamma 
distribution with parameters n and nz,,. The mean of this posterior distribution is then n/|[n%,,] = 1/Fn. 


22. The posterior p.d.f. is proportional to the likelihood since the prior “p.d.f.” is constant. The likelihood 
is proportional to 


20 


Ce (0.95) 


exp |- 


using the same reasoning as in the proof of Theorem 7.3.3 of the text. As a function of @ this is easily 
recognized as being proportional to the p.d.f. of the normal distribution with men —0.95 and variance 
60/20 = 3. The posterior probability that 9 > 1 is then 


1 — (—0.95) 
1-® (—) = 1 — 0(1.126) = 0.1301. 


23. (a) Let the prior p.d-f. of 6 be €4,4(0). Suppose that X1,..., Xp are i.i.d. with conditional p.d.f. f(2|6) 
given 0, where f is as stated in the exercise. The posterior p.d.f. after observing these data is 


a(9)°*" exp [c(8) {8 + Dir d(ai)} | 
Jo. a(8)2* exp [e(9) {8 + Dkr a(ws) }] ao 


Eq. (8.7.1) is of the form of €,/(0) with a’ =a+n and f’ = B+, d(ax;). The integral in 
the denominator of (S.7.1) must be finite with probability 1 (as a function of x1,...,2,,) because 
T]i4 6(#;) times this denominator is the marginal (joint) p.d.f. of X1,..., Xn. 


é(6|a) = (8.7.1) 


(b) This was essentially the calculation done in part (a). 
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24. In each part of this exercise we shall first present the p.d.f. or the p.f. f, and then we shall identify the 
functions a, b, c, and d in the form for an exponential family given in Exercise 23. 


x a p_\* . _ = _ p 
(a) f(«|p) =p (1—p)' = (1—p) (2) . Therefore, a(p) = 1—p, b(x) = 1, c(p) = log (4), 
ao) =e 
(b) fe |a)= a Therefore, a(@) = exp(—8@), b(x) = 1/z!, c(@) = log 0, d(x) = x. 
(c) f(@ |p) = (“ - ‘ora — p)*. Therefore, a(p) = p", b(x) = ("*2-!), e(p) = log(1 — p), 
ala) =a. 
(d) 


1 x 
fel) = Gee? - aa 


_ 1 57 ye? px 
— (Qn)? exp = ge exp ~ Sa exp =) . 


2 2 


1 
Therefore, a(y) = (anor &XP(— 953)» bz) = exp(—55); E(u) = oO ds) =z 
(e) fi |e?) = 1 exp[- 2") Therefore, a(a?) = : (z) = 1, e(o*) = : 
(Qnro2)/2 ~Qo2 I ? (Qr02)1/2’ ? Io2? 
d(x) = (a — p)?. 
) fle | a) = aye exp(—8z). Therefore, a(a) = tay b(z) = exp(—6x), c(a) = a-1, 
d(x) = log x 
(g) f(a | B) in this part is the same as the p.d.f. in part (f). Therefore, a(G) = 6%, b(x) = a 
c(8) = —B, d(x) =. 
_ EGO) a4 p-1 _ T(at+ 8) _ (1-2)? * 2 
(hy f(¢|a)= Tara)” (1—2)?""°. Therefore, a(a) = Ta) b(x) = Ts) ,c(a) =a-1, 
d(x) = log x. 
, per —— I'(a + 8) 
(i) f(a | 8) in this part is the same as the p.d.f. given in part (h). Therefore, a(3) = Ta) 
i(2) =F. (8) = 81, dle) = tog( ~ 2) 


25. For every 6, the p.d.f. (or p.f.) f(x|0) for an exponential family is strictly postive for all x such that 
b(x) > 0. That is, the set of x for which f(z|@) > 0 is the same for all 0. This is not true for uniform 
distributions where the set of x such that f(|@) > 0 is [0,6]. 


26. The same reasoning applies as in the previous exercise, for uniform distributions, the set of x such that 
f(a\@) > 0 depends on 6. For exponential families, the set of « such that f(|@) > 0 is the same for all 
6. 
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7.4 Bayes Estimators 


Commentary 


We introduce the fundamental concepts of Bayesian decision theory. The use of a loss function arises again 
in Bayesian hypothesis testing in Sec. 9.8. This section ends with foundational discussion of the limitations 
of Bayes estimators. This material is included for those instructors who want their students to have both a 
working and a critical understanding of the topic. 

If you are using the statistical software R, the function mentioned in Example 7.4.5 to compute the 
median of a beta distribution is qbeta with the first argument equal to 0.5 and the next two equal to a+ y 
and 6+n-—y, in the notation of the example. 


Solutions to Exercises 


1. The posterior distribution of 0 would be the beta distribution with parameters 2 and 1. The mean 
of the posterior distribution is 2/3, which would be the Bayes estimate under squared error loss. The 
median of the posterior distribution would be the Bayes estimate under absolute error loss. To find the 
median, write the c.d.f. as 


6 
P(e) = | 2tdt = 6”, 
0 


for 0 <@ <1. The quantile function is then F~!(p) = p!/?, so the median is (1/2)!/ = 0.7071. 


2. The posterior distribution of @ is the beta distribution with parameters 5+ 1 = 6 and 10+ 19 = 29. 
The mean of this distribution is 6/(6 + 29) = 6/35. Therefore, the Bayes estimate of 0 is 6/35. 


3. If y denotes the number of defective items in the sample, then the posterior distribution of 6 will be 
the beta distribution with parameters 5+ y and 10 + 20 — y = 30 —y. The variance V of this beta 
distribution is 


(5 + y)(30—y) 


"=~ B5)2(36) 


Since the Bayes estimate of @ is the mean y of the posterior distribution, the mean squared error of 
this estimate is E[(@ — 1)? | 2], which is the variance V of the posterior distribution. 


(a) V will attain its maximum at a value of y for which (5+y)(30—y) isa maximum. By differentiating 
with respect to y and setting the derivative equal to 0, we find that the maximum is attained when 
y = 12.5. Since the number of defective items y must be an integer, the maximum of V will be 
attained for y = 12 or y = 13. When these values are substituted into (5 + y)(30 — y), it is found 
that they both yield the same value. 


Since (5+ y)(30—y) is a quadratic function of y and the coefficient of y? is negative, its minimum 
value over the interval 0 < y < 20 will be attained at one of the endpoints of the interval. It is 
found that the value for y = 0 is smaller than the value for y = 20. 


S 


4. Suppose that the parameters of the prior beta distribution of 6 are a and 6. Then pup = a/(a+). As 
shown in Example 7.4.3, the mean of the posterior distribution of @ is 


ea a Eg fe 
a+B+n a+B+n a+B+n 


- 
Hence, y, = n/(a+ 84+ 7) and yy, 7 1 as n—- ov. 
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. It was shown in Exercise 5 of Sec. 7.3 that the posterior distribution of 6 is the gamma distribution 


with parameters a = 16 and 6 = 6. The Bayes estimate of @ is the mean of this distribution and is 
equal to 16/6 = 8/3. 


. Suppose that the parameters of the prior gamma distribution of 0 are a and 8. Then pio = a/3. The 


posterior distribution of 6 was given in Theorem 7.3.2. The mean of this posterior distribution is 


at yc, Xx; = 
yi con B pe n Ke. 
Bon Bon Bon 


Hence, y, = n/(8 +n) and yp, > 1 as n > ov. 


. The Bayes estimator is the mean of the posterior distribution of 0, as given in Exercise 6. Since @ is 


the mean of the Poisson distribution, it follows from the law of large numbers that X,, converges to 0 
in probability as n + oo. It now follows from Exercise 6 that, since 7, — 1, the Bayes estimators will 
also converge to @ in probability as n — oo. Hence, the Bayes estimators form a consistent sequence of 
estimators of 6. 


. It was shown in Exercise 7 of Sec. 7.3 that the posterior distribution of @ is the normal distribution 


with mean 69.07 and variance 0.286. 


(a) The Bayes estimate is the mean of this distribution and is equal to 69.07. 


(b) The Bayes estimate is the median of the posterior distribution and is therefore again equal to 
69.07. 


. For any given values in the random sample, the Bayes estimate of 0 is the mean of the posterior 


distribution of 6. Therefore, the mean squared error of the estimate will be the variance of the posterior 
distribution of 8. It was shown in Exercise 10 of Sec. 7.3 that this variance will be 0.01 or less for n > 396. 


It was shown in Exercise 12 of Sec. 7.3 that the posterior distribution of 6 will be a gamma distribution 
with parameters a = 20.04 and 6 = 76.2. The Bayes estimate is the mean of this distribution and is 
equal to 20.04/76.2 = 0.263. 


Let X1,...,Xp denote the observations in the random sample, and let a and 3 denote the parameters 
of the prior gamma distribution of 6. It was shown in Theorem 7.3.4 that the posterior distribution of 
6 will be the gamma distribution with parameters a +n and 8 +nX,. The Bayes estimator, which is 
the mean of this posterior distribution is, therefore, 


atn _ 1+ (a/n) 


Since the mean of the exponential distribution is 1/0, it follows from the law of large numbers that 
X,, will converge in probability to 1/@ as n — oo. It follows, therefore, that the Bayes estimators will 
converge in probability to 0 as n + oo. Hence, the Bayes estimators form a consistent sequence of 
estimators of 6. 


(a) A’s prior distribution for 0 is the beta distribution with parameters a = 2 and 3 = 1. Therefore, 
A’s posterior distribution for @ is the beta distribution with parameters 2+710 = 712 and 1+290 = 
291. B’s prior distribution for 6 is a beta distribution with parameters a = 4 and @ = 1. 
Therefore, B’s posterior distribution for @ is the beta distribution with parameters 4+ 710 = 714 
and 1 + 290 = 291. 
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(b) A’s Bayes estimate of 6 is 712/(712+291) = 712/1003. B’s Bayes estimate of @ is 714/(714+291) = 
714/1005. 

(c) If y denotes the number in the sample who were in favor of the proposition, then A’s posterior 
distribution for 6 will be the beta distribution with parameters 2+ y and 1+ 1000—y = 1001—y, 
and B’s posterior distribution will be a beta distribution with parameters 4+ y and 1+1000—y = 
1001 —y. Therefore, A’s Bayes estimate of @ will be (2+ y)/1003 and B’s Bayes estimate of 6 will 
be (4+ y)/1005. But 


4+y 2+y|  2(1001—y) 


1005 1003}  (1005)(1003) ” 
This difference is a maximum when y = 0, but even then its value is only 
2(1001) 2 
(1005)(1003) ~ 1000° 


If 6 has the Pareto distribution with parameters a > 1 and xo > 0, then 
alR .  @ 
=| 9. do = —"— 2, 


It was shown in Exercise 18 of Sec. 7.3 that the posterior distribution of # will be a Pareto distribution 
with parameters a + n and max{xo, X1,...,Xn}. The Bayes estimator is the mean of this posterior 
distribution and is, therefore, equal to (a +7) max{zo,.X1,...,Xn}/(at+tn-—1). 


Since = = 6?, the posterior distribution of w can be derived from the posterior distribution of 6. The 
Bayes estimator 2) will then be the mean E (w) of the posterior distribution of 7. But E(w) = E(@)?, 
where the first expectation is calculated with respect to the posterior distribution of ~ and the second 
with respect to the posterior distribution of 0. Since 6 is the mean of the posterior distribution of 0, it 
is also true that 6 = E (0). Finally, since the posterior distribution of 6 is a continuous distribution, it 
follows from the hint given in this exercise that 


) = E(6") > [E(0)/? = 6. 
Let ag be a 1/(1 +c) quantile of the posterior distribution, and let a; be some other value. Assume 


that a; < ag. The proof for a; > ag is similar. Let g(@|x) denote the posterior p.d.f. The posterior 
mean of the loss for action a is 


ia cf (a — 6)g(6|xr) w+ | g(O|x)d0 
We shall now show that h(a1) > h(ao), with strict inequality if a, is not a 1/(1 +c) quantile. 


Naga < cf ta = untes: [ea ii de eave (Olea 


+ [ (a ~ ay) g(6|x) dd (8.7.2) 


The first integral in (S.7.2) equals c(a1 — ao)/(1+c) because ag is a 1/(1+c) quantile of a the posterior 
distribution, and the posterior distribution is continuous. The second integral in (S.7.2) is at least as 
large as (a9 — a1) Pr(ag < 6 < ay|x) since —(1+c)@ > —(1+ c)a; for all 6 in that integral. In fact, the 
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integral will be strictly larger than (ao — a1) Pr(ao < 0 < ay|x) if this probability is positive. The last 
integral in (S.7.2) equals (ap — a1) Pr(@ > ai|x). So 


a1 — ao 


h(a1) — h(ao) > oe 


+ (ag — a1) Pr(@ > aog|x) = 0. (S.7.3) 


The equality follows from the fact that Pr(@ > ao|z) = c/(1+c). The inequality in (S.7.3) will be strict 
if and only if Pr(ap < 0 < aj|x) > 0, which occurs if and only if a; is not another 1/(1 +c) quantile. 


7.5 Maximum Likelihood Estimators 


Commentary 


Although maximum likelihood is a popular method of estimation, it can be valuable for the more capable 
students to see some limitations that are described at the end of this section. These limitations arise only in 
more complicated situations than those that are typically encountered in practice. This material is probably 
not suitable for students with a limited mathematical background who are learning statistical inference for 
the first time. 


Solutions to Exercises 


1. We can easily compute 


1 n 
EY) = — ti Dis 
c=! 
1 nm 
E(Y?) — So a7. 
| 


2. It was shown in Example 7.5.4 that the M.L.E. is %,. In this exercise, %,, = 58/70 = 29/35. 


3. The likelihood function for the given sample is p°°(1 — p)!?. Among all values of p in the interval 
1/2 < p< 2/3, this function is a maximum when p = 2/3. 


4. Let y denote the sum of the observations in the sample. Then the likelihood function is p¥(1 — p)"~¥. 
If y = 0, this function is a decreasing function of p. Since p = 0 is not a value in the parameter space, 
there is no M.L.E. Similarly, if y =n, then the likelihood function is an increasing function of p. Since 
p =1 is not a value in the parameter space, there is no M.L.E. 


5. Let y denote the sum of the observed values 71,...,2,. Then the likelihood function is 
exp(—n0)0¥ 
f(a | 8) = SPRAIN 
I] @e) 
i=1 
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(a) If y > 0 and we let L(@) = log fp(a | A), then 
2 i= 

0 

The maximum of L(@) will be attained at the value of 6 for which this derivative is equal to 0. In 

this way, we find that 0 = y/n = Ey. 


(b) If y= 0, then f,,(a | 6) is a decreasing function of 6. Since 6 = 0 is not a value in the parameter 
space, there is no M.L.E. 


6. Let 6 = 07. Then the likelihood function is 
fa(w | 6) = —— | es — a)? 
a = (onan? exp 50 2. De jp : 
If we let L(@) = log fn(x | 8), then 
0 n i eres 
—L(¢) = -—— + — ben ple, 


The maximum of L(@) will be attained at a value of 6 for which this derivative is equal to 0. In this 
way, we find that 


6=-— So (xi ae 
Ge 
7. Let y denote the sum of the observed values 71,...,2%,. Then the likelihood function is 


fn(@ | 8) = B” exp(—By). 
If we let L(8) =log fn(a | 8), then 


The maximum of L({) will be attained at the value of 8 for which this derivative is equal to 0. Therefore, 
p= y= L/zp. 


8. Let y denote the sum of the observed values 71,...,2%,. Then the likelihood function is 


_ J exp(n@—y) for min{z1,...,2,} > 0 
Ine | 8) = 0 otherwise. 


(a) For each value of x, f,(a | 0) will be a maximum when @ is made as large as possible subject 
to the strict inequality 6 < min{z1,...,2,}. Therefore, the value 6 = min{z1,...,2,} cannot be 
used and there is no M.L.E. 


(b) Suppose that the p.d.f. given in this exercise is replaced by the following equivalent p.d.f., in which 
strict and weak inequalities have been changed: 


_ Jj exp(@—2z) forz> 8, 
rela=4 for 2 <0. 
Then the likelihood function f,(x | @) will be nonzero for 0 < min{xj,...,2,} and the M.L.E. will 
be @ = mini ais ws Bat: 
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If0 <a; <1 for7=1,...,n, then the likelihood function will be as follows: 


in 6-1 
file |@)st" (11 «| , 
i=l 


If we let L(@) = log f,(a | 6), then 

3) he <= 

—L(0) = — 1 es 

30 (9) aT » og x 
Therefore, 6 = —n/ T”_, log aj. It should be noted that 6 > 0. 
The likelihood function is 


Piso sr eXP {-Soks - 0} 
4=1 


n 

Therefore, the M.L.E. of 6 will be the value that minimizes » |x;—6|. The solution to this minimization 
i=1 

problem was given in the solution to Exercise 10 of Sec. 4.5. 

The p.d.f. of each observation can be written as follows: 


1 
Guo, fora<¢< hy, 


Fe | 6) = | 0 


otherwise. 


Therefore, the likelihood function is 


1 


fala | 01,02) = (02 —01)" 


for 6; < min{z,...,%n}< max{z1,...,%n}< 60, and f,(a#| 61,02) =0 otherwise. Hence, fn(a | 61,02) 
will be a maximum when 692 — 6; is made as small as possible. Since the smallest possible value of 92 is 
max{x ,...,@,} and the largest possible value of 0; is min{z1,...,2,}, these values are the M.L.E.’s. 


The likelihood function is 
TAC | Gisecete =O a. 
If we let L(61,...,0,) =log fn(x| 01,...,9%) and let 0, =1— . 6;, then 


OL(O1,---,8x) _ Mi Me fori=1,...,k-—1. 


If each of these derivatives is set equal to 0, we obtain the relations 
6, O82 | Oo 
mn on NK 


If we let 0; = an; fori =1,...,k, then 


k k 
l=) =e Son 
i=l i=l 
Hence a = 1/n. It follows that 6; = eft 100 = Vyas 5 i 
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13. It follows from Eq. (5.10.2) (with 2; and x2 now replaced by x and y) that the likelihood function is 


1 


ier | ise) 90 arta (BGP) (2S) (8S) + (82) ]} 


If we let L(ui, 2) = log f(x,y | 41, "2), then 


OL(u1yH2) _ 1 2 (Som) + (Sou - me) 


Oy 1—p? |o? j=1 ee ae 

OL (1, H2) 1 J1l/< p_(< 

eee —_ —Titie |= ti — ney ||. 
Ope 1— p? |o3 » ue " 0109 » : e 


When these derivatives are set equal to 0, the unique solution is w4=%, and vg =¥%,,. Hence, these 
values are the M.L.E.’s. 


7.6 Properties of Maximum Likelihood Estimators 


Commentary 


The material on sampling plans at the end of this section is a bit more subtle than the rest of the section, 
and should only be introduced to students who are capable of a deeper understanding of the material. 

If you are using the software R, the digamma function mentioned in Example 7.6.4 can be computed with 
the function digamma which takes only one argument. The trigamma function mentioned in Example 7.6.6 
can be computed with the function trigamma which takes only one argument. FR also has several functions like 
nlm and optim for minimizing general functions. The required arguments to nlm are the name of another R 
function with a vector argument over which the minimization is done, and a starting value for the argument. 
If the function has additional arguments that remain fixed during the minimization, those can be listed after 
the starting vector, but they must be named explicitly. For optim, the first two arguments are reversed. 
Both functions have an optional argument hessian which, if set to TRUE, will tell the function to compute a 
matrix of numerical second partial derivatives. For example, if we want to minimize a function f(x,y) over 
x with y fixed at c(3,1.2) starting from x=x0, we could use 
optim(x0,f,y=c(3,1.2)). If we wish to maximize a function g, we can define f to be —g and pass that to 
either optim or nlm. 


Solutions to Exercises 
1. The M.L.E. of exp(—1/0) is exp(—1/0), where 6 = —n/S~ log(2;) is the M.L.E. of @. That is, the 


i=1 
M.L.E. of exp(—1/6) is 


n n 1/n n 1/n 
exp (>: lox(e)/) = exp [i TI “| = (11 «| : 
i=l i=1 


2. The standard deviation of the Poisson distribution with mean @ is o = 6!/2. Therefore, ¢ = 61/2. It 
was found in Exercise 5 of Sec. 7.5 that 9 = X>. 
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. The median of an exponential distribution with parameter { is the number m such that 


[fF Bexp(—Bar)ax =n. 
0 2 


Therefore, m = (log 2)/8, and it follows that mh = (log 2)/8. It was shown in Exercise 7 of Sec. 7.5 
that = 1/Xq: 

. The probability that a given lamp will fail in a period of T hours is p = 1 — exp(—(T), and the 
probability that exactly x lamps will fail is (") p’(1—p)”-*. It was shown in Example 7.5.4 that 
p=2x/n. Since B = — log(1 — p)/T, it follows that B= —log(1 — a2/n)/T. 


. Since the mean of the uniform distribution is u = (a+ b)/2, it follows that fa = (@+ b) /2. It was shown 
in Exercise 11 of Sec. 7.5 that @ = min{X,...,X,} and 6 = max{Xj,..., Xn}. 


. The distribution of Z = (X — )/o will be a standard normal distribution. Therefore, 


0.95 = Prx(X <6) =Pr(z< —*) -0(—*). 
oO oO 


Hence, from a table of the values of ® it is found that (@ — u)/o = 1.645. Since 6 = + 1.6450, it 
follows that 6 = 1+ 1.6456. By example 6.5.4, we have 


. y= Px(X >2)=Pr(Z > —*) -1-0(=—) -o(4=), 
Therefore, 7 = ®((ji — 2)/6). 
. Let 6 =I"(a)/T(a). Then 6 = I’(a)/T(a). It follows from Eq. (7.6.5) that 6 = 7%, (log X;)/n. 


. If we let y = 0, %, then the likelihood function is 
—_ n a-l 
hielo.) = poe (fts)”eot-o 
If we now let L(a,@) = log f,(a | a, 8), then 


L(a,B) =na log B—n log I'(a) + (a — 1) log (11 «| — By. 


Since @ and must satisfy the equation OL(a, 8)/08 = 0 [as well as the equation OL(a, 8)/Oa = Oj, it 
follows that 4/8 = y/n = Tn. 
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10. The likelihood function is 


11. 


12. 


13. 


14. 


fu(a| 0.8) = [ASS (I: _ ffte-2] 


If we let L(a, 8) =log fn(x | a, G), then 
L(a,B) = n log I(a+6)—n log T(a) —n log I(6) 


+(a—1) ibs x,+(6-1) eee — xj). 
i=1 i=1 


OL(a, 2) - ne tet 8) _ 7b ibe Xj 


and 


The estimates @ and B must satisfy the equations OL(a, B)/Oa = 0 and OL(a, B)/08 = 0. Therefore, 
a@ and ( must also satisfy the equation OL(a, 3)/Oa = OL(a, 3)/08. This equation reduces to the one 
given in the exercise. 


Let Y, = max{Xj,...,X,}. It was shown in Example 7.5.7 that 6=Y,p. Therefore, for ¢ > 0, 


Pr(|d — 0] <<) = Pr(¥ > 0-2) =1- (=) ; 


Tt follows that lim. Pr(| — | <¢) =1. Therefore, 6 4 6. 


We know that B =1/X,,. Also, since the mean of the exponential distribution is = 1/3, it follows 
from the law of large numbers that X;, 4, 1/8. Hence, B 4 B. 


Let Z; = —log X; fori = 1,...,n. Then by Exercise 9 of Sec. 7.5, 6 = 1/Zn. If X; has the p.df. 
f(a | @) specified in that exercise, then the p.d.f. g(z | 6) of Z; will be as follows, for z > 0: 


gl |) = flexp(—z) | |] = 6(exp(—z))""* exp(—2) = B exp(—62). 


Therefore, Z; has an exponential distribution with parameter 6. It follows that E(Z;) = 1/0. Further- 
more, since X,,...,X, form a random sample from a distribution for which the p.d.f. is f(x | @), it 
follows that Z21,...,Z, will have the same joint distribution as a random sample from an exponential 


distribution with parameter 9. Therefore, by the law of large numbers, 7, A /@. It follows that 
640. 


The M.L.E. p is equal to the proportion of butterflies in the sample that have the special marking, 
regardless of the sampling plan. Therefore, (a)p = 5/43 and (b) p = 3/58. 
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As explained in this section, the likelihood function for the 21 observations is equal to the joint p.d-f. 
of the 20 observations for which the exact value is known, multiplied by the probability exp(—15/,) 
that the 21st observation is greater than 15. If we let y denote the sum of the first 20 observations, 
then the likelihood function is 


can exp(—y/H) exp(—15/1). 


Since y = (20)(6) = 120, this likelihood function reduces to 


1 
720 exp(—135/,1). 


The value of js which maximizes this likelihood function is 4 = 6.75. 


The likelihood function determined by any observed value x of X is 0°x? exp(—Ox)/2. The likelihood 
function determined by any observed value y of Y is (20)¥ exp(—20)/y!. Therefore, when X = 2 and 
Y = 3, each of these functions is proportional to 0° exp(—20). The M.L.E. obtained by either statistician 
will be the value of @. which maximizes this expression. That value is 6 = 3/2 


10\ , 
The likelihood function determined by any observed value x of X is . p*(1—p)'°*. By Eq. (5.5.1) 


a+ y 


the likelihood function determined by any observed value y of Y is ( pi(1 —p)”. Therefore, when 


X =4and Y =6, each of these likelihood functions is proportional to p*(1—p)®. The M.L.E. obtained 
by either statistician will be the value of p which maximizes this expression. That value is p = 2/5. 


The mean of a Bernoulli random variable with parameter p is p. Hence, the method of moments 
estimator is the sample mean, which is also the M.L.E. 


The mean of an exponential random variable with parameter 6 is 1/3, so the method of moments 
estimator is one over the sample mean, which is also the M.L.E. 


The mean of a Poisson random variable is 0, hence the method of moments estimator of @ is the sample 
mean, which is also the M.L.E. 


The M.L.E. of the mean is the sample mean, which is the method of moments estimator. The M.L.E. of 
o” is the mean of the X?’s minus the square of the sample mean, which is also the method of moments 
estimator of the variance. 


The mean of X; is 6/2, so the method of moments estimator is 2X,. The M.L.E. is the maximum of 
the X; values. 


(a) The means of X; and X? are respectively a/(a + 8) and a(a + 1)/[(a + B)(a + 8 +1). We set 
these equal to the sample moments %, and x?,, and solve for a and 3. After some tedious algebra, 


we get 
és In(Ln — £2) 
a = SS 
Ln — 
a = (1 — Zn) (En — £2 n) 
- Ley — Te 
n n 
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(b) The M.L.E. involves derivatives of the gamma function and the products []j_; x; and | [j_,(1—2;). 


24. The p.d.f. of each (X;, Y;) pair can be factored as 


a ne 2 1 _ a 
(27)1/20, exp (sate: — f1) ) Gna exp (-s (ys — aw — B2;) ) ; (S.7.4) 


where the new parameters are defined in the exercise. The product of n factors of the form (S.7.4) can 
be factored into the product of the n first factors times the product of the n second factors, each of 
which can be maximized separately because there are no parameters in common. The product of the 
first factors is the same as the likelihood of a sample of normal random variables, and the M.L.E.’s are 
ji, and o? as stated in the exercise. The product of the second factors is slightly more complicated than 
the likelihood from a sample of normal random variables, but not much more so. Take the logarithm 
to get 


: > Gi —a— Bx). (S.7.5) 


— Flog (2m) + log(o3..)] — 
i=1 


2 
2054 


Taking the partial derivatives with respect to a@ and £ yields 


ra) nm 

0a —_ of, dui a— a), 
O i, ee 

aB = i ee 


Setting the first line equal to 0 and solving for a yields 
a= J, — Bn. (S.7.6) 
Plug (S.7.6) into the second of the partial derivatives to get (after a bit of algebra) 


dora (i _ In) (Us = Oi) 


BS Sy BP (8.7.7) 
Substitute (S.7.7) back into (S.7.6) to get 
B® = Un — Bf. 
Next, take the partial derivative of (8.7.5) with respect to 03, to get 
n 1 2 
~agt, * Get 2 (vi —a— Bx;)°. (S.7.8) 


Now, substitue both & and f into (8.7.8) and solve for 03. The result is 


031 =— (yi — a — Bai)”. 


i=1 


slr 
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Finally, we can solve for the M.L.E.’s of the original parameters. We already have ji; and a2. The 
equation a = [M2 — po2gft1/o1 can be rewritten a = p2 — G4. It follows that 


jig = & + Bir =Tn- 


The equation 6 = po2/o1 can be rewritten po = Bo. Plugging this into 03, = (1 — p”)o3 yields 
0%, = 0% — 67072. Hence, 


03 = 03, + fot 
Uy = 2 
= Ly ry — a — Bay)? + Sei Fo)(ai ~ Fo) 
= ni 
ae yt (25 — En)? 
1 nm 
= =o (y%i-Tn)*: 
y es 
i=1 


where the final equality is tediuous but straightforward algebra. Finally, 


G2 [hy (ai — En)?]? (Va (wi — Tay” 


25. When we observe only the first n — k Y;’s, the M.L.E.’s of yz; and of are not affected. The M.L.E.’s of 
a, 8 and o3, are just as in the previous exercise but with n replaced by n — k. The M.L.E.’s of pia, 03 


and p are obtained by substituting 4, B and 03, into the three equations Exercise 24: 


j2 = a+8fy 
a} = of, +o? 
_ pa 
POS Se 

02 


7.7 Sufficient Statistics 


Commentary 


The concept of sufficient statistics is fundamental to much of the traditional theory of statistical inference. 
However, it plays little or no role in the most common practice of statistics. For the most popular distribu- 
tional models for real data, the most obvious data summaries are sufficient statistics. In Bayesian inference, 
the posterior distribution is automatically a function of every sufficient statistic, so one does not even have 
to think about sufficiency in Bayesian inference. For these reasons, the material in Secs. 7.7—7.9 should only 
be covered in courses that place a great deal of emphasis on the mathematical theory of statistics. 


Solutions to Exercises 


In Exercises 1-11, let t denote the value of the statistic 7’ when the observed values of X1,...,X,» are 
1,--..,%y. In each exercise, we shall show that T is a sufficient statistic by showing that the joint p.f. or 
joint p.d.f. can be factored as in Eq. (7.7.1). 


1. The joint p.f. is 
ae 


fn(w |p) =p'(1—p 
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. The joint p-f. is 


fn(@ | p) = p"(1 — py’. 


. The joint p-f. is 


fa(x | p) = TI ( = ‘) "(1 — p)'. 


: wv. 
q=1 & 


Since the expression inside the first set of square brackets does not depend on the parameter p, it follows 
that T is a sufficient statistic for p. 


. The joint p.d-f. is 


1 t 
fal@ | a”) = TaseTE OP { — aa}. 


. The joint p.d-f. is 


1 “ oA na 
fn(x | B) = la (U1 «| {8° exp(—nft)}. 


. The joint p.d.f. in this exercise is the same as that given in Exercise 5. However, since the unknown 


parameter is now a instead of 2, the appropriate factorization is now as follows: 


fala | a) = {oo (-04) ! {ere} 


. The joint p.d-f. is 


g 1 n a T'(a + £) i a-1 
jel o)=| eg [fo] Here 


Therefore, the joint p.f. is 


1 
— fort <9, 
fr(w | A) = 4 


0 otherwise. 


If the function h(t, @) is defined as in Example 7.7.5, with the values of t and 6 now restricted to positive 
integers, then it follows that 
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. The joint p.d-f. is 


h(t, b) 
(6—a)" 


where h is defined in Example 7.7.5. 


frlz | b) = 


The joint p.d-f. is 


h(a, t) 


fr(@ | a) = bay’ 
where fh is defined in Example 7.7.5. 


The joint p.d.f. or joint p.f. is 


(a | 0) = { Tyo (x; Vf ato |” exp[c(0)t] }. 


The likelihood function is 


a” x5” 


ae’ 


for all x; > xo. 


(8.7.9) 


(a) If zo is known, a is the parameter, and (8.7.9) has the form u(x)v[r(a),a], with u(a) = 1 if all 
x; > Zo and 0 if not, r(x) = []#_, 2%, and vt, a] = axe" /t°*1. So T]#_, X; is a sufficient statistic. 

(b) If a is known, xo is the parameter, and (8.7.9) has the form u(x)v[r(x), xo], with u(a) = 
a TT, aot, r(w) = min{x,...,2n}, and vit,ro] = 1 if t > xo and O if not. Hence 
min{X,,...,Xp} is a sufficient statistic. 


The statistic T will be a sufficient statistic for 6 if and only if f,,(a | @) can be factored as in Eq. (7.7.1). 
However, since r(a) can be expressed as a function of r’(a), and conversely, there will be a factorization 
of the form given in Eq. (7.7.1) if and only if there is a similar factorization in which the function v 
is a function of r’(a) and 6. Therefore, T will be a sufficient statistic if and only if T” is a sufficient 
statistic. 


This result follows from previous exercises in two different ways. First, by Exercise 6, the statistic 

= |], X; is a sufficient statistic. Hence, by Exercise 13, T = log T” is also a sufficient statistic. 
A second way to establish the same result is to note that, by Exercise 24(g) of Sec. 7.3, the gamma 
distributions form an exponential family with d(x) = log x. Therefore, by Exercise 11, the statistic 
T =>, d(X;) is a sufficient statistic. 


It follows from Exercise 11 and Exercise 24(i) of Sec. 7.3 that the statistic T’ = )>"%_, log(1 — X;) is a 
sufficient statistic. Since T is a one-to-one function of T’, it follows from Exercise 13 that T is also a 
sufficient statistic. 


Let f(@) be a prior p.d.f. for 9. The posterior p.d.f. of @ is, according to Bayes’ theorem, 
fn(wl9) F(@) ee ae 


OT scald) s ea wd i dy [otra wl FW)ab (w)dw 


where the second equality uses the factorization criterion. One can see that this last expression depends 
on « only through r(a). 
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First, suppose that T is sufficient. Then the likelihood function from observing X = z@ is u(x)vu[r(x), 6], 
which is proportional to v[r(a), 6]. The likelihood from observing T = t (when t = r(a)) is 


3 u(x)u[r(x), 6] = v{t, 8] > u(x), (S.7.10) 
where the sums in (8.7.10) are over all x such that t = r(a). Notice that the right side of (S.7.10) 
is proportional to v|t, 6] = v[r(x), 6]. So the two likelihoods are proportional. Next, suppose that the 


two likelihoods are proportional. That is, let f(a|@) be the p.f. of X and let h(t|@) be the p.f. of T. If 
t = r(a) then there exists c(ax) such that 


f(x|0) = ula) h(t|@). 


Let v[t, 6] = h(t|@) and apply the factorization criterion to see that T is sufficient. 


7.8 Jointly Sufficient Statistics 


Commentary 


Even those instructors who wish to cover the concept of sufficient statistic in Sec. 7.7, may decide not to 
cover jointly sufficient statistics. This material is at a slightly more mathematical level than most of the text. 


Solutions to Exercises 


In Exercises 1—4, let t; and tg denote the values of 7, and J) when the observed values of X1,...,Xny are 


T15-- 


.,%y. In each exercise, we shall show that T, and 75 are jointly sufficient statistics by showing that the 


joint p.d.f. of X1,...,X,, can be factored as in Eq. (7.8.1). 


1. 


The joint p.d-f. is 


fn(@ | a, 8) = 


. The joint p.d-f. is 


_ [Ia ae] a—1,8-1 
Srl | a, B) _ Fong ty i) 7 
. Let the function h be as defined in Example 7.8.4. Then the joint p.d.f. can be written in the following 
form: 
axe n 
ine | tg,0) = {o76)" n(zost) 
2 
. Again let the function h be as defined in Example 7.8.4. Then the joint p.d.f. can be written as follows: 


ee | 6) 7 MO Alta + 3) 
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The joint p.d.f. of the vectors (X;, Y;), fori = 1,...,, was given in Eq. (5.10.2). The following relations 
hold: 


M: 
= 


ll 
an 


n n 

= So oF — 2 Dae t+ ni, 
i=1 i=1 
n nm 

= dvi = 22 dvi + np, 


Me 
, <a, 
S 


i=1 

n nm n 
VG: — p1)( = Sew ped & = Sy + pepe. 
i=1 i=1 i=1 


Because of these relations, it can be seen that the joint p.d.f. depends on the observed values of the 
n vectors in the sample only through the values of the five statistics given in this exercise. Therefore, 
they are jointly sufficient statistics. 


. The joint p.d.f. or joint p-f. is 


fala | 8) = {Tho a} Yt (yr on [Sain ate] b 


It follows from the factorization criterion that T),...,7j; are jointly sufficient statistics for 0. 


. In each part of this exercise we shall first present the p.d.f. f, and then we shall identify the functions 


a, b, c,, d1, c2, and dg in the form for a two-parameter exponential family given in Exercise 6. 
(a) Let 0 = (u,07). Then f(z | @) is as given in the solution to Exercise 24(d) of Sec. 7.3. Therefore, 


a(0) = aaa P(- £,), me =1,¢(0)= — di(x) = x”, co(0) = = do(x) = x. 


(b) Let 6 = (a, 8). Then f(a | @) is as given in the solution to Exercise 24(f) of Sec. 7.3. Therefore, 
a(0) = Fey: He) = 1, e1(6) = a — 1, d(x) = log 2, (8) = —8, da(2) = 
(c) Let 6 = (a,8). Then f(z | @) is as given in the solution to Exercise 24(h) of Sec. 7.3. Therefore, 


ae) = om b(x) = 1, (0) =a—1, di (x) = logaz, c2(@) = B — 1, and d2(x) = log(1 — z). 


. The M.L.E. of 8 is n/>7¥_, X;. (See Exercise 7 in Sec. 7.5.) This is a one-to-one function of the 


sufficient statistic found in Exercise 5 of Sec. 7.7. Hence, the M.L.E. is sufficient. This makes it 
minimal sufficient. 


. By Example 7.5.4, 6 = X,. By Exercise 1 of Sec. 7.7, is a sufficient statistic. Therefore, p is a minimal 


sufficient statistic. 


By Example 7.5.7, 6 = max{X1,...,Xn}. By Example 7.7.5, 6 is a sufficient statistic. Therefore, 6 is 
a minimal sufficient statistic. 


By Example 7.8.5, the order statistics are minimal jointly sufficient statistics. Therefore, the M.L.E. 
of 0, all by itself, cannot be a sufficient statistic. (We know from Example 7.6.5 that there is no 
simple expression for this M.L.E., so we cannot solve this exercise by first deriving the M.L.E. and then 
checking to see whether it is a sufficient statistic.) 
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If we let T = max{Xj,...,X»}, let ¢ denote the observed value of T, and let the function h be as 
defined in Example 7.8.4, then the likelihood function can be written as follows: 


* (IIe) 
a 


This function will be a maximum when 9 is chosen as small as possible, subject to the constraint that 
h(t,@) = 1. Therefore, the M.L.E. of @ is 0 = t. The median m of the distribution will be the value 
such that 


fala | 0) = (t, 0). 


ne 1 
[ fle | @)dx = 5. 


Hence, m = 0/,/2. It follows from the invariance principle that the M.L.E. of m is m= 6//2 = t/V/2. 


By applying the factorization criterion to f,(a | 0), it can be seen that the statistic T is a sufficient 
statistic for 0. Therefore, the statistic T/ 2 which is the M.L.E. of m, is also a sufficient statistic for 
0. 


By Exercise 11 of Sec. 7.5, @ = min{Xj,...,X,} and b= max{X1,...X,}. By Example 7.8.4, @ and b 
are jointly sufficient statistics. Therefore, @ and b are minimal jointly sufficient statistics. 


It can be shown that the values of the five M.L.E.’s given in Exercise 24 of Sec. 7.6 can be derived from 
the values of the five statistics given in Exercise 5 of this section by a one-to-one transformation. Since 
the five statistics in Exercise 5 are jointly sufficient statistics, the five M.L.E.’s are also jointly sufficient 
statistics. Hence, the M.L.E.’s will be minimal jointly sufficient statistics. 


The Bayes estimator of p is given by Eq. (7.4.5). Since 57>, a; is a sufficient statistic for p, the Bayes 
estimator is also a sufficient statistic for p. Hence, this estimator will be a minimal sufficient statistic. 


It follows from Theorem 7.3.2 that the Bayes estimator of \ is (a+ 7, X;)/(G +n). Since 77_, X; is 
a sufficient statistic for A, the Bayes estimator is also a sufficient statistic for 4. Hence, this estimator 
will be a minimal sufficient statistic. 


The Bayes estimator of pz is given by Eq. (7.4.6). Since X,, is a sufficient statistic for 4, the Bayes 
estimator is also a sufficient statistic. Hence, this estimator will be a minimal sufficient statistic. 


Improving an Estimator 


Commentary 


If you decided to cover the material in Secs. 7.7 and 7.8, this section gives one valuable application of that 
material, Theorem 7.9.1 of Blackwell and Rao. This section ends with some foundational discussion of the 
use of sufficient statistics. This material is included for those instructors who want their students to have 
both a working and a critical understanding of the topic. 


Solutions to Exercises 


1. 


The statistic Y, = 7", X? is a sufficient statistic for 0. Since the value of the estimator 5, cannot be 
determined from the value of Y,, alone, 6; is inadmissible. 
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. A sufficient statistic in this example is max{X,,...,X;,}. Since 2X, is not a function of the sufficient 


statistic, it cannot be admissible. 


. The mean of the uniform distribution on the interval [0,6] is 9/2 and the variance is 07/12. Therefore, 


Eo(Xn) = 0/2 and Varg(X,;,) = 07/(12n). In turn, it now follows that 


62 
E6(61) = 0 and Varg(d1) = 37, 


Hence, for @ > 0, 


@2 
R(O, 61) = Eo[(d1 — 6)?] = Vare(d,) = = 
(a) It follows from the discussion given in Sec. 3.9 that the p.d.f. of Y;, is as follows: 
n—1 
ny 
— for0<y<4@, 
gy | 8) = gn = 

0 otherwise. 


Hence, for 6 > 0, 


0 n—1 962 
R(6,62) = E ¥a— 6) = [ a 
(0,82) = Bal(¥n ~ 01 = f(y - 0? hay = 

(b) if m= 2, RO, 01) = R(O,d0) = 07/6. 

(c) Suppose n> 3. Then R(6,62) < R(O,6,) for any given value of 6 > 0 if and only if 2/[(n + 
1)(n + 2)] < 1/(3n) or, equivalently, if and only if 6n < (n+ 1)(n +2) = n?4+3n+4+ 2. Hence, 
R(0, 62) < R(0,6,) if and only if n? — 3n +2 > 0 or, equivalently, if and only if (n— 2)(n—1) > 0. 
Since this inequality is satisfied for all values of n > 3, it follows that R(@,d2) < R(0,61) for every 
value of 0 > 0. Hence, 62 dominates 61. 


. For any constant c, 


R(6,cYn) = Eo[(c¥p — 0)"] = Eo (V2) — 2c Eg(Yn) + 0? 


2 
( es de c+1) 6 
n+2 n+1 


Hence, for any given value of n and any given value of 6 > 0, R(@,cY,) will be a minimum when c is 
chosen so that the coefficient of 6? in this expression is a minimum. By differentiating with respect to 
c, we find that the minimizing value of c is c = (n+ 2)(n +1). Hence, the estimator (n + 2)Y;,/(n + 1) 
dominates every other estimator of the form cY,. 


. It was shown in Exercise 6 of Sec. 7.7 that []j_, X; is a sufficient statistic in this problem. Since the 


value of X,, cannot be determined from the value of the sufficient statistic alone, X,, is inadmissible. 


(a) Since the value of 6 is always 3, R(8,5) = (8 — 3)?. 

(b) Since R(3,6) = 0, no other estimator 6, can dominate 6 unless it is also true that R(3,6,) = 0. 
But the only way that the M.S.E. of an estimator 6, can be 0 is for the estimator 6; to be equal 
to 3 with probability 1. In other words, the estimator 6; must be the same as the estimator 6. 
Therefore, 6 is not dominated by any other estimator and it is admissible. 
In other words, the estimator that always estimates the value of 6 to be 3 is admissible because it 
is the best estimator to use if 8 happens to be equal to 3. Of course, it is a poor estimator to use 
if 6 happens to be different from 3. 
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8. It was shown in Example 7.7.2 that >>, X; is a sufficient statistic in this problem. Since the proportion 
6 of observations that have the value 0 cannot be determined from the value of the sufficient statistic 
alone, { is inadmissible. 


9. Suppose that X has a continuous distribution for which the p.d.f. is f. Then 


axy= f nf(e)de+ | ef(e) de. 


—co 


Suppose first that E(X) < 0. Then 


ex=-B) =f afe)de— [” 2floyae 
< [oredr [of ae 


[ lelf@) ae = EX). 


A similar proof can be given if X has a discrete distribution or a more general type of distribution, or 
if F(X) > 0. 


Alternatively, the result is immediate from Jensen’s inequality, Theorem 4.2.5. 


10. We shall follow the steps of the proof of Theorem 7.9.1. It follows from Exercise 9 that 
Eq(|6 — @||T) > | Eo(6 — 8|T)| = |Eo(6| T) — 4 = [60 — 4]. 
Therefore, 
E4(|0 — |) < Eo[Eo(|6 — @||T)] = Eo(|d — 8]). 


11. Since 6 is the M.L.E. of 6, we know from the discussion in Sec. 7.8 that 6 is a function of T alone. 


Since @ is already a function of T, taking the conditional expectation E (6|T) will not affect 6. Hence, 
69 = E(6|T) = 0. 


12. Since X; must be either 0 or 1, 
EXy|T =s) =] Pr Aga 17 =7). 
If t = 0 then every X; must be 0. Therefore, 
EY Xa | T = 0} 0, 


Suppose now that t > 0. Then 


| F Pr (x1 = 1 and yx] 
Pr(X; =landT=t j—2 
ee ) Pr(T =) Pr(T =t) 
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Since X; and )7i.. X; are independent, 


Pr (2 <1 and Sx 1-1] = Pr(X, =1)Pr (ox.=1-1] 


i=2 


Also, Pur — t) = (") p(l _ or. It follows that 


Pr(X, =1|T =t) = (o/(0) = . 


Therefore, for any value of T, 


E(X,|T) =T/n=Xn 


A more elegant way to solve this exercise is as follows: By symmetry, E(X1|T) = E(X2|T) = 


E(X,,|T). Therefore, nE(X1|T) = _, E(X;|T). But 
n 
S_(£Ga|r)= #(3: x7) = E(T|T) = 
i=1 

Hence, E(X,|T) = T/n. 


be 0 or 1, 


E(Y,\T=t) = Pr(% =1\T =t) =Pr(X, =0|T =2) 


Pr |X, =0 and X,=t 
Pr(Ay =Oand T=7t) _ ( 2 
= Pil =¢%) 7 Pol =f) 


203 


13. We shall carry out the analysis for Y,;. The analysis for every other value of i is similar. Since Y; must 


The random variables X; and 5°", X; are independent, X, has a Poisson distribution with mean 0, 


and )7i_. X; has a Poisson distribution with mean (n — 1)0. Therefore, 


exp(=(n = Ha)[(n = Hal 


Pr (31 <0 and $01 =#) =P =P r(Soxi=t) =es0t- 0) - 1 


1=2 


Also, since T has a Poisson distribution with mean n§, 


exp(—n6)(n0)* 


Pri ={)= A 


It now follows that E(Y,|T = t) = ({n —1]/n)‘. 
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nm 

If Y; is defined as in Exercise 12 for i= 1,...,n, then B = a Y;/n. Also, we know from Exercise 12 
i=1 

that E(Y;|T) = ([n —1]/n)? for i=1,...,n. Therefore, 


ait) = (23oxI7) =i aM |7)=F-n(2 4) - nt)" 


n 


Let 6 be the M.L.E. of 6. Then the M.L.E. of exp(@ + 0.125) is exp(6 + 0.125). The M.L.E. of 0 is Xn, 
so the M.L.E. of exp(6+0.125) is exp(X, +0.125). The M.S.E. of an estimator of the form exp(Xp, +c) 
is 


E [(expXn +c] — exp[6 + 0.125))"] 


2 

= Var(exp[X, + c¢]) + [E(expXn + c]) — exp(@+ 0.125)| 

= exp(20 + 0.25/n + 2c)[exp(0.25/n) — 1] + [exp(@ + 0.125/n + c) — exp(@ + 0.125)]? 

= exp(26){exp(0.25/n + 2c)[exp(0.25/n) — 1] + exp(0.25/n + 2c) — 2exp(0.125[1 + 1/n] +c) 

+ exp(0.5)} 

= exp(26) [exp(2c) exp(0.5/n) — 2 exp(c) exp(0.125[1 + 1/n]) + exp(0.5)] . 
Let a = exp(c) in this last expression. Then we can minimize the M.S.E. simultaneously for all 6 by 
minimizing 

a? exp(0.5/n) — 2aexp(0.125[1 + 1/n]) + exp(0.5). 

The minimum occurs at a = exp(0.125 — 0.375/n), so c = 0.125 — 0.375/n. 


p = Pr(X; = 1/0) = exp(—0@)0. The M.L.E. of 6 is the number of arrivals divided by the observation 
time, namely X,. So, the M.L.E. of p is exp(—Xn)Xn. In Example 7.9.2, T/n = Xn. If n is large, 
then T should also be large so that (1 — 1/n)" = exp(—T/n) according to Theorem 5.3.3. 


7.10 Supplementary Exercises 


Solutions to Exercises 


il 


2. 


(a) The prior distribution of 6 is the beta distribution with parameters 1 and 1, so the posterior 
distribution of @ is the beta distribution with parameters 1+ 10 = 11 and 14 25-—10= 16. 


(b) With squared error loss, the estimate to use is the posterior mean, which is 11/27 in this case. 


We know that the M.L.E. of 6 = X,. Hence, by the invariance property described in Sec. 7.6, the 
M.L.E. of 6? is X72. 


. The prior distribution of @ is the beta distribution with a = 3 and 6 = 4, so it follows from Theorem 7.3.1 


that the posterior distribution is the beta distribution with a = 3+3=6 and 8=4+7=11. The 
Bayes estimate is the mean of this posterior distribution, namely 6/17. 


. Since the joint p.d.f. of the observations is equal to 1/6” provided that 6 < x; < 20 fori = 1,...,n, 


the M.L.E. will be the smallest value of 6 that satisfies these restrictions. Since we can rewrite the 
restrictions in the form 


il 
5g max{11,---,2n} <0 < mun{ 7,254 5%,} 
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it follows that the smallest possible value of @ is 


~ I 
d= 3 max{21, pieieg eg be 


. The joint p.d.f. of X; and X92 is 


1 1 
exp |—<—5 (a1 — byp)? — —s (x2 — bop)? }. 
270102 207 


2 
205 


If we let L(y) denote the logarithm of this expression, and solve the equation dL()/du = 0, we find 
that 


o3b1 21 i ozboxe 


= Qp2 22 
axby + o7b5 


. Since ['(a + 1) = al'(a), it follows that I’(a + 1) = al’(a) + T'(a). Hence, 


_ Mati) _ ala) l(a) 
Patl1) = T(a+1) T(at+l1) (a+) 
= De) 4. hice 
= Tag eg eg 


. The joint p.d.f. of X1, X2, X3 is 


f(al0) 1 (-3 ) 1 ( 1 ) 1 ( 1 ) 1 | ( +243), 
= —exp | —- -— exp | -— -— exp | -— = - —+—)-|. 
Bue Bk get Be PN ap Ay BON BO) BR Le ay 
(a) By solving the equation Olog(f)/00 = 0, we find that 
~ 1 1 1 
O=-|X,+=Xo+ =X }. 
5 ( a ce s) 


(b) In terms of ~, the joint p.d.f. of X1, X2, X3 is 
3 


(ev) = exp |— (a1 + 5a + 50a) o], 


Since the prior p.d.f. of w is 


E(w) x b** exp(—BY), 


it follows that the posterior p.d.f. is 


ECW | @) o Fle] WIECH) 0 ¥™*? exp |— (B+ au + 500+ 50) VI]. 


Hence, the posterior distribution of ~ is the gamma distribution with parameters a+ 3 and 8+ a,+ 
t2/2+ 23/3. 
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8. The joint p.f. of Xo,...,Xn+41 is the product of n factors. If X; = x; and X;41 = x41, then the 7th 
factor will be the transition probability of going from state x; to state v4 (i = 1,...,n). Hence, each 
transition from s;, to s; will introduce the factor 6, each transition from s; to s2 will introduce the 
factor 1 — 0, and every other transition will introduce either the constant factor 3/4 or the constant 
factor 1/4. Hence, if A denotes the number of transitions from s; to s; among the n transitions and B 
denotes the number from s; to s2, then the joint p.f. of X2,...,Xn41 has the form (const.) @4(1 — 6)?. 
Therefore, this joint p.f., or likelihood function, is maximized when 


9. The posterior p.d.f. of 6 given X = «x satisfies the relation 
E(6 | x) x f(x | O)E(@) «x exp(—6), for 0 > x. 


Hence, 


otherwise. 


c@le)=4 ee for 0 >a, 


(a) The Bayes estimator is the mean of this posterior distribution, 6 = x + 1. 


(b) The Bayes estimator is the median of this posterior distribution, 6 = 2 + log 2. 


10. In this exercise, 9 must lie in the interval 1/3 < @ < 2/3. Hence, as in Exercise 3 of Sec. 7.5, the M.L.E. 


of @ is 
ees ae 2 
Xn i= nso 
a 1 a 1 
6= 3 Ht Xess 
2 —— 2 


It then follows from Sec. 7.6 that B = 36-1. 


1 1 
11. Under these conditions, X has a binomial distribution with parameters n and @ = man 
e 


= A 
Et 
° 
lar 
> 
a 


Since 0 < p< 1, it follows that 1/4 < @ < 3/4. Hence, as in Exercise 3 of Sec. 7.5, th 


X 1 =X 23 

— if-<—<-, 

n 47-7 nn 4 
a 1 X 1 
@=¢ = foe 

: oe 

3 ae 

—- if—>-., 

4 n 4 


It then follows from Theorem 7.6.1 that the M.L.E. of p is p = 2(@ — 1/4). 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


12. 


13. 


14. 
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Section 7.10. Supplementary Exercises Zor 


The prior distribution of 0 is the Pareto distribution with parameters xj = 1 and a = 1. Therefore, 
it follows from Exercise 18 of Sec. 7.3 that the posterior distribution of @ will be a Pareto distribution 
with parameters a +n and max{%o,21,...,%n}. In this exercise n = 4 and max{zo,7,...,%,} = 1. 
Hence, the posterior Pareto distribution has parameters a = 5 and xp = 1. The Bayes estimate of 6 
will be the mean of this posterior distribution, namely 

. 00 5 5 

-= | @ 2 a= 

1 66 4 

The Bayes estimate of @ will be the median of the posterior Pareto distribution. This will be the value 
m such that 

1 co, 1 

-= — dj = —>. 

2 m O° m? 
Hence, 6=m=21/5, 


The joint p.d.f. of X1,...,X, can be written in the form 


fn(#|8,0) = B" exp (ns ~8>- «| 


i=1 
nm 
for min{z1,...,%,} > 0, and f,(a|G,@) = 0 otherwise. Hence, by the factorization criterion, bwe.6 
i=1 
and min{X,,..., X,} is a pair of jointly sufficient statistics and so is any other pair of statistics that 
is a one-to-one function of this pair. 


n at+l 

The joint p.d.f. of the observations is a” x9”*/ (11 n) for min{a,...,2,} > 2. This p.d.t. is 
i=1 

maximized when zo is made as large as possible. Thus, 


Xo = min{X), in pha te 


Since a is known in Exercise 15, it follows from the factorization criterion, by a technique similar to 
that used in Example 7.7.5 or Exercise 12 of Sec. 7.8, that min{X1,...,X,} is a sufficient statistic. 
Thus, from Theorem 7.8.3, since the M.L.E. Zo is a sufficient statistic, it is a minimal sufficient statistic. 


It follows from Exercise 15 that %) = min{X,,..., X,,} will again be the M.L.E. of xo , since this value 
of xg maximizes the likelihood function regardless of the value of a. If we substitute <9 for x9 and let 
L(a) denote the logarithm of the resulting likelihood function, which was given in Exercise 15, then 


L(a) = nloga+n alog xo — (a +1) $0 log a; 
i=1 


and 


dL (a) 


n 
n 
=—-+nlogé —)5 l 7 
da a sos a) Og Xj 


i=l 


Hence, by setting this expression equal to 0, we find that 
ie = 
a= (4 3- tous ox . 
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20. 


21. 
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It can be shown that the pair of estimators <9 and @ found in Exercise 17 form a one-to-one transform 
of the pair of jointly sufficient statistics T, and T> given in Exercise 3 of Sec. 7.8. Hence, % and @ 
are themselves jointly sufficient statistics. It now follows from Sec. 7.8 that #9 and @ must be minimal 
jointly sufficient statistics. 


The p.f. of X is 


f(a|n,p) = ("ora =a". 


The M.L.E. of n will be the value that maximizes this expression for given values of x and p. The ratio 
given in the hint to this exercise reduces to 


n+1 


eS, “lang: 
a (1=p) 


Since R is a decreasing function of n, it follows that f(a|n,p) will be maximized at the smallest value 
of n for which R < 1. After some algebra, it is found that R < 1 if and only if n > x/p— 1. Hence, n 
will be the smallest integer greater than x/p —1. If x/p — 1 is itself an integer, then x/p —1 and a/p 
are both M.L.E.’s. 


The joint p.d.f. of X; and X9 is 1/(40?) provided that each of the observations lies in either the interval 
(0,4) or the interval (20,30). Thus, the M.L.E. of 6 will be the smallest value of 6 for which these 


restrictions are satisfied. 


(a) If we take 36 = 9, or 6 = 3, then 6 will be as small as possible, and the restrictions will be satisfied 
because both observed values will lie in the interval (20, 36). 


(b) It is not possible that both X, and Xo lie in the interval (20,30), because for that to be true it is 
necessary that X2/X, < 3/2. Here, however, X2/X, = 9/4. Therefore, if we take 6 = 4, then 6 
will be as small as possible and the restrictions will be satisfied because Xj, will lie in the interval 
(0,6) and X9 will lie in (26,36). 

(c) It is not possible that both X; and X42 lie in the interval (20,30) for the reason given in part (b). 
It is also not possible that Xj lies in (0,0) and X2 lies in (20,36), because for that to be true it is 
necessary that X2/X, > 2. Here, however, X2/X, = 9/5. Hence, it must be true that both X1 
and X» lie in the interval (0,0). Under this condition, the smallest possible value of 6 is 6 = 9. 


The Bayes estimator of @ is the mean of the posterior distribution of 6, and the expected loss or M.S.E. 
of this estimator is the variance of the posterior distribution. This variance, as given by Eq. (7.3.2), is 


» _ (100)(25) 100 


Vy 


~ 100+25n n+4 


Hence, n must be chosen to minimize 


By setting the first derivative equal to 0, it is found that the minimum occurs when n = 16. 


n 


It was shown in Example 7.7.2 that T’ = >», X; is a sufficient statistic in this problem. Since the sample 


i=l 
variance is not a function of T alone, it follows from Theorem 7.9.1 that it is inadmissible. 
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Sampling Distributions of Estimators 


8.1 The Sampling Distribution of a Statistic 


Solutions to Exercises 


1. The edd. of U = miax{ X45 ..<5.%,5} 08 


0 for u < 0, 
Fu)=4¢ (u/@)" for0<u< 0, 
1 for u > 6. 


Since U < 0 with probability 1, the event that |U — 0| < 0.16 is the same as the event that U > 0.96. 
The probability of this is 1— F'(0.90) = 1—0.9". In order for this to be at least 0.95, we need 0.9" < 0.05 
or n > log(0.05)/1log(0.9) = 28.43. So n > 29 is needed. 


2. It is known that X,, has the normal distribution with mean @ and variance 4/n. Therefore, 
Eo(|Xn — 6|) = Varg(Xn) = 4/n, 
and 4/n < 0.1 if and only if n > 40. 


3. Once again, X,, has the normal distribution with mean 6 and variance 4/n. Hence, the random variable 
Z = (Xp, — 9)/(2/\/n) will has the standard normal distribution. Therefore, 


aii 9(|Z|) = i. el eewn(-2 2 /2)\d oy zexp(—z7/2)dz 


3 oy. 
nT 


Eo(|Xn — 4) 


But 2,\/2/(n7) < 0.1 if and only if n > 800/77 = 254.6. Hence, n must be least 255. 


4. If Z is defined as in the solution of Exercise 3, then 
Pr([Xn — 0| < 0.1) = Pr(|Z| < 0.05./n) = 26(0.05/n) — 1. 
Therefore, this value will be at least 0.95 if and only if ®(0.05,/n) > 0.975. It is found from a table of 
values of ® that we must have 0.05,/n > 1.96. Therefore, we must have n > 1536.64 or, since n must 


be an integer, n > 1537. 
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5. When p = 0.2, the random variable Z, = nX»p, will have a binomial distribution with parameters n and 


p = 0.2, and 
Pr([Xn — p| < 0.1) = Pr(0.1n < Z, < 0.3n). 


The value of n for which this probability will be at least 0.75 must be determined by trial and error 
from the table of the binomial distribution in the back of the book. For n = 8, the probability becomes 


Pr(0.8 < Zg < 2.4) = Pr(Zg = 1) + Pr(Zg = 2) = 0.3355 + 0.2936 = 0.6291. 
For n = 9, we have 
Pr(0.9 < Z < 2.7) = Pr(Zq = 1) + Pr(Zo = 2) = 0.3020 + 0.3020 = 0.6040. 
For n = 10, we have 
Pr(1 < Zo < 3) = Pr(Zy9 = 1) + Pr(Z19 = 2) + Pr(Z10 = 3) = 0.2684 + 0.3020 + 0.2013 = 0.7717. 


Hence, n = 10 is sufficient. 


It should be noted that although a sample size of n = 10 will meet the required conditions, a sample 
size of n = 11 will not meet the required conditions. For n = 11, we would have 


Pr(1.1 < 2, < 3.3) = Pr(Zy, = 2) + Pr(Zy = 3). 


Thus, only two terms of the binomial distribution for n = 11 are included, whereas three terms of 
binomial distribution for n = 10 were included. 


. It is known that when p = 0.2, E(X;,) = p = 0.2 and Var(X,,) = (0.2)(0.8)/n = 0.16/n. Therefore, 


Z = (Xp,—0.2)/(0.4/./n) will have approximately a standard normal distribution. It now follows that 
Pr([Xn — p| < 0.1) = Pr(|Z| < 0.25\/n) ~ 26(0.25/n) — 1. 


Therefore, this value will be at least 0.95 if and only if ©(0.25,/n) > 0.975 or, equivalently, if and only 
if 0.25,/n > 1.96. This final relation is satisfied if and only if n > 61.5. Therefore, the sample size must 
be n > 62. 


. It follows from the results given in the solution to Exercise 6 that, when p = 0.2, 


_ _ 0.16 
Ep (|Xn — p|?) = Var(Xn) = re 


and 0.16/n < 0.01 if and only if n > 16. 


. For an arbitrary value of p, 


= = pil—p 

By([Xq — pl?) = Var(X,) = PAP) 

This variance will be a maximum when p = 1/2, at which point its value is 1/(4n). Therefore, this 
variance will be not greater than 0.01 for all values of p(O < p< 1) if and only if 1/(4n) < 0.01 or, 


equivalently, if and only if n > 25. 


. The M.L.E. is 6 = n/T, where T was shown to have the gamma distribution with parameters n and 


6. Let G(-) denote the c.d.f. of the sampling distribution of T. Let H(-) be the c.d-f. of the sampling 
distribution of 6. Then H(t) = 0 for t < 0, and for t > 0, 


H() =Pr(6<t)=Pr(Z<t)=Pr(72 4) =1-G(2). 
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8.2 The Chi-Square Distributions 


Commentary 


If one is using the software R, then the functions dchisq, pchisq, and qchisq give the p.d.f., the c.d-f,, 
and the quantile function of y? distributions. The syntax is that the first argument is the argument of the 
function, and the second is the degrees of freedom. The function rchisq gives a random sample of y? random 
variables. The first argument is how many you want, and the second is the degrees of freedom. All of the 
solutions that require the calculation of x? probabilites or quantiles can be done using these functions instead 
of tables. 


Solutions to Exercises 


1. The distribution of 207/0.09 is the x? distribution with 20 degrees of freedom. We can write Pr(T’ < 
c) = Pr(207'/0.09 < 20c/0.09). In order for this probability to be 0.9, we need 20c/0.09 to equal the 0.9 
quantile of the x? distribution with 20 degrees of freedom. That quantile is 28.41. Set 28.41 = 20c/0.09 
and solve for c = 0.1278. 


2. The mode will be the value of x at which the p.d-f. f(a) is a maximum or, equivalently, the value of x 


at which log f(x) is a maximum. We have 


log f(x) = (const.) + (> — 1) log x — 


If m = 1, this function is strictly decreasing and increases without bound as x > 0. If m = 2, this 
function is strictly decreasing and attains its maximum value when « = 0. If m > 3, the value of x at 
which the maximum is attained can be found by setting the derivative with respect to x equal to 0. In 
this way it is found that 7 = m — 2. 


3. The median of each distribution is found from the table of the y? distribution given at the end of the 
book. 


(a)m=1 4 


fo) 
A 
a 
=e 
x 


mode 
median 
mean 


Figure $.8.1: First figure for Exercise 3 of Sec. 8.2. 
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(b)m=2 4 


fo) 
= 
ice) 
ye) 
x< 


mode 
median c 
mean 


° 
an 
Nd 
o 
NI 
wo 
< 


mode 
median 


mean 


a 


(d)m=4 


o 
aS 
x< 


mean 


Figure $.8.2: Second figure for Exercise 3 of Sec. 8.2. 
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4. Let r denote the radius of the circle. The point (X,Y) will lie inside the circle if and only if 
X?+Y? <r?. Also, X? + Y? has a x? distribution with two degrees of freedom. It is found from the 
table at the end of the book that Pex? we a 9.210) = 0.99. Therefore, we must have r? > 9.210. 


5. We must determine Pri x? Ye de ee 1). Since Koa Y¥*4 7° has the “7 distribution with three 
degrees of freedom, it is found from the table at the end of the book that the required probability is 
slightly less than 0.20. 


6. We must determine the probability that, at time 2, X? +Y*+Z? < 1607. At time 2, each of the 
independent variables X, Y, and Z will have a normal distribution with mean 0 and variance 207. 
Therefore, each of the variables X/ 20, ¥/ /20, and Z/ /20 will have a standard normal distribution. 
Hence, V = (X? + Y? + Z?)/(207) will have a x? distribution with three degrees of freedom. It now 
follows that 


Prox? yw < ibe") = Priv = 8). 


It can now be found from the table at the end of the book that this probability is slightly greater than 
0.95. 


7. By the probability integral transformation, we know that T; = F;(X;) has a uniform distribution on 
the interval [0,1]. Now let 7; = —logT;. We shall determine the p.d.f. g of Z;. The p.d.f. of T; is 


1 forO<t<1, 
fi) = 0 otherwise. 


Since T; = exp(—Z;), we have dt/dz = — exp(—z). Therefore, for z > 0, 


g(2) = flexp(—2))|F] = exn(-2). 


Thus, it is now seen that Z; has the exponential distribution with parameter @ = 1 or, in other words, 
the gamma distribution with parameters a = 1 and 6 = 1. Therefore, by Exercise 1 of Sec. 5.7, 2Z; has 
nm 


the gamma distribution with parameters a = 1 and 8 = 1/2. Finally, by Theorem 5.7.7 ~ 22; will 
i=1 

have the gamma distribution with parameters a = n and 3 = 1/2 or, equivalently, the y? distribution 

with 2n degrees of freedom. 


8. It was shown in Sec. 3.9 that the p.d.f. of W is as follows, for 0 < w < 1: 
hi(w) = n(n —1)w" 71 — w). 


Let X = 2n(1—W). Then W =1—- X/(2n) and dw/dx = —1/(2n). Therefore, the p.d.f. gn(x) is as 


follows, for 0 < x < 2n: 
n—-2 
z|~"e-0-m) (ys) 
oC 2n 2n 2n 
_ —2 n 
eae ee 
4 n 2n 2n 
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Now, as n > ov, 


n— 


1 —2 
A Pand (1-=) aegis 
n 2n 


Also, for any real number t, (1 +t/n)" — exp(t). Therefore, (1 — x/(2n))” > exp(—a/2). Hence, for 
x > 0, 


Gn(x) > z exp(—2/2). 


This limit is the p.d.f. of the y? distribution with four degrees of freedom. 


. It is known that X,, has the normal distribution with mean p and variance o7/n. Therefore, (Xp — 


u)/(a//n) has a standard normal distribution and the square of this variable has the x? distribution 
with one degree of freedom. 


Each of the variables X; + Xo + X3 and X4+ X5 + X6 will have the normal distribution with mean 
0 and variance 3. Therefore, if each of them is divided by V3, each will have a standard normal 
distribution. Therefore, the square of each will have the x? distribution with one degree of freedom 
and the sum of these two squares will have the x? distribution with two degrees of freedom. In other 
words, Y/3 will have the y? distribution with two degrees of freedom. 


The simplest way to determine the mean is to calculate E(X Ly ?) directly, where X has the x? distri- 
bution with n degrees of freedom. Thus, 


E(x?) = "/2)—1 oxp(—2/2)dx = x—YI? exp(—a/2)da 


1 (oe) 
2/21 (n/2) [ 
Var [(n + 1)/2] 

T(n/2) 


of 1 
ya 
[ *" PE (n]2)” 
1 


PPD QO /2T (e+ 1)/2] = 


For general o?, 


- (S.8.1) 


10 x 0.09 
Pr(Y < 0.09) = Pr (w = SS) ; 
o 


where W = 10Y/o? has the y? distribution with 10 degrees of freedom. The probability in (S.8.1) is 
at least 0.9 if and only if 0.9/c? is at least the 0.9 quantile of the x? distribution with 10 degrees of 
freedom. This quantile is 15.99, so 0.9/0? > 15.99 is equivalent to 0? < 0.0563. 


We already found that the distribution of W = noe /o? is the x? distribution with n degrees of freedom, 
which is also the gamma distribution with parameters n/2 and 1/2. If we multiply a gamma random 
variable by a constant, we change its distribution to another gamma distribution with the same first 
parameter and the second parameter gets divided by the constant. (See Exercise 1 in Sec. 6.3.) Since 


o2 = (o?/n)W, we see that the distribution of o is the gamma distribution with parameters n/2 and 


n/(207). 
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8.3 Joint Distribution of the Sample Mean and Sample Variance 


Commentary 


This section contains some relatively mathematical results that rely on some matrix theory. We prove 
the statistical independence of the sample average and sample variance. We also derive the distribution 
of the sample variance. If your course does not focus on the mathematical details, then you can safely 
cite Theorem 8.3.1 and look at the examples without going through the orthogonal matrix results. The 
mathematical derivations rely on a calculation involving Jacobians (Sec. 3.9) which the instructor might have 
skipped earlier in the course. 


Solutions to Exercises 


1. We found that U = no? /o” has the x? distribution with n — 1 degrees of freedom, which is also the 
gamma distribution with parameters (n — 1)/2 and 1/2. If we multiply a gamma random variable by 
a number c, we change the second parameter by dividing it by c. So, with ¢ = o?/n, we find that 
cU =o? has the gamma distribution with parameters (n — 1)/2 and n/(207). 


2. It can be verified that the matrices in (a), (b), and (e) are orthogonal because in each case the sum of 
the squares of the elements in each row is 1 and the sum of the products of the corresponding terms 
in any two different rows is 0. The matrix in (c) is not orthogonal because the sum of squares for the 
bottom row is not 1. The matrix in (d) is not orthogonal because the sum of the products for rows 1 
and 2 (or any other two rows) is not 0. 


3. (a) Consider the matrix 
Ae 1/f21/V2 
i ay ag ; 


For A to be orthogonal, we must have a? + a3 = 1 and —xa; + —xa2 = 0. It follows from the 


FA + 


second equation that a; = —a and, in turn, from the first equation that a? = 1/2. Hence, either 
the pair of values a; = 1/2 and aj = —1/V2 or the pair a; = —1//2 and aj = 1/2 will make 
A orthogonal. 


Consider the matrix 


1/V3 1/V3—1/V3 


A= ay, ag a3 
by i) bs 


= 


For A to be orthogonal, we must have 


2 2 2 
aj +ag+a3=1 


and 
1 1 1 
Wet a as = 3" =0. 
Therefore, ag = —a, — a2 and it follows from the first equation that 


a? + a2 + (a; + a2)? = 2a? + 202 + 2aj,02 = 1. 


Any values of a; and ag satisfying this equation can be chosen. We shall use a, = 2/6 and 


ay = —1/V6. Then a3 = —1/V6. 
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Finally, we must have b7 + b3 + 62 = 1 as well as 
1 


Hb 1 
bj —— 9 4 — 0 
Ve WB 


and 
2 1 1 
Sy eee eee 
ve Ve Ve° 


This final pair of equations can be rewritten as 


= 0. 


bg +63 = —b; and bo + b3 = 2b. 


Therefore, bj = 0 and by = —b3. Since we must have b3 + b3 = 1, it follows that we can use either 
by = 1/2 and bz = —1/V/2 or by = —1/V2 and b3 = 1/2. Thus, one orthogonal matrix is 


1/V31/V3 1/V3 
A=|2//6 -1/V6 —-1/V6 
0 1/V2 —1/V2 


4. The 3 x 3 matrix of coefficients of this transformation is 


0.8 0.6 0 
A=1|(0.3)V2  —(0.4)/2 —(0.5)V/2 
(0.3)/2 —(0.4)V2  § (0.5)V2 


Since the matrix A is orthogonal, it follows from Theorem 8.3.4 that Y,,Y2, and Y3 are independent 
and each has a standard normal distribution. 


5. Let Z; = (X; — )/o for i=1,2. Then Z; and Zz are independent and each has a standard normal 
distribution. Next, let Yj = (Z, + Z2)/V2 and Yo = (Z, — Z2)//2. Then the 2 x 2 matrix of coefficients 
of this transformation is 


Ae 1//2—s 1/V2 
~ a/Y2 1/2] ° 


Since the matrix A is orthogonal, it follows from Theorem 8.3.4 that Y; and Y2 are also independent 
and each has a standard normal distribution. Finally, let Wy = X; + Xq and Wj = X; — X2. Then 
W, = V20Y; + 2u and Wz = V/2cY>. Since Y; and Yo are independent, it now follows from Exercise 15 
of Sec. 3.9 that W; and W2 are also independent. 


6. (a) Since (X; — «)/o has a standard normal distribution for i=1,...,n, then W = S- aw 
i=l 
has the x? distribution with n degrees of freedom. The required probability can be rewritten as 
follows: 


Pr (5 <W< 2n). 
Thus, when n= 16, we must evaluate Pr(8< W < 32) = Pr(W < 32) — Pr(W < 8), Where W has 
the y? distribution with 16 degrees of freedom. It is found from the table at the end of the book 
that Pr(W < 32) = 0.99 and Pr(W < 8) = 0.05. 
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2M (Xi -— Xn)? 
(b) By Theorem 8.3.1, V = drei (HiT Xn)" 


or 
The required probability can be rewritten as follows: 


has the x? distribution with n — 1 degrees of freedom. 


pr(S<V<2n), 


Thus, when n = 16, we must evaluate Pr(8< V < 32) = Pr(V < 32) — Pr(V <8), Where V has the 
x? distribution with 15 degrees of freedom. It is found from the table that Pr(V < 32) = 0.993 
and Pr(V < 8) = 0.079. 


7. (a) The random variable V = né?/o? has a y? distribution with n—1 degrees of freedom. The 
required probability can be written in the form Pr(V < 1.5n) > 0.95. By trial and error, it is 
found that for n = 20, V has 19 degrees of freedom and Pr(V < 30) < 0.95. However, for n = 21, 
V has 20 degrees of freedom and Pr(V < 31.5) > 0.95. 


(b) The required probability can be written in the form 
n on 3n n 
r($svs>) pr(v< 2) pr(v< i), 


where V again has the y? distribution with n — 1 degrees of freedom. By trial and error, it is 
found that for n = 12, V has 11 degrees of freedom and 


Pr(V < 18) — Pr(V < 6) = 0.915 — 0.130 < 0.8. 
However, for n = 13, V has 12 degrees of freedom and 
Pr(V < 19.5) — Pr(V < 6.5) = 0.919 — 0.113 > 0.8. 


8. If X has the x? distribution with 200 degrees of freedom, then it follows from Theorem 8.2.2 that X 
can be represented as the sum of 200 independent and identically distributed random variables, each 
of which has a xy? distribution with one degree of freedom. Since E(X) = 200 and Var(X) = 400, 
it follows from the central limit theorem that Z = (X — 200)/20 will have approximately a standard 
normal distribution. Therefore, 


Pr(160 < X < 240) = Pr(—2 < Z < 2) © 20(2) — 1 = 0.9546. 


9. The sample mean and the sample variance are independent. Therefore, the information that the sample 
variance is closer to a? in one sample than it is in the other sample provides no information about which 
of the two sample means will be closer to yz. In other words, in either sample, the conditional distribution 
of X,,, given the observed value of the sample variance, is still the normal distribution with mean pu 
and variance o?/n. 


8.4 The t Distributions 


Commentary 


In this section, we derive the p.d.f. of the ¢ distribution. That portion of the section (entitled “Derivation 
of the p.d.f.”) can be skipped by instructors who do not wish to focus on mathematical details. Indeed, 
the derivation involves the use of Jacobians (Sec. 3.9) that the instructor might have skipped earlier in the 
course. 

If one is using the software R, then the functions dt, pt, and qt give the p.d.f., the c.d.f., and the quantile 
function of ¢ distributions. The syntax is that the first argument is the argument of the function, and the 
second is the degrees of freedom. The function rt gives a random sample of ¢ random variables. The first 
argument is how many you want, and the second is the degrees of freedom. All of the solutions that require 
the calculation of t probabilites or quantiles can be done using these functions instead of tables. 
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Solutions to Exercises 


00 yon ae 2\ 7 Mt2 
I. B(x?) =e | a" (+=) de = 2c | xv (+=) dx, 
—oco nm 0 n 


T|(n + 1)/2] 
(ny"7D(n]2) 


a = ey PU —y)~3/?. Therefore, 


1/2 
where c = If y is defined as in the hint for this exercise, then x = (+) and 


L=y 


Ex) = Vatconst.) [ i ae (1 ae a 


o l-y Ly 
1 
= n®/*(const.) [ y/2(1 — yy? dy 


P3/2) (n= 2)/2] _ 3) T(r = 2)/2] 
= ni” (const. ay = nap (5) Fa) 


= 8 (V8) aay aT 


Since E(X) = 0, it now follows that Var(X) = n/(n — 2). 


— 


2. Since fi = X,, and G? = S?/n, it follows from the definition of U in Eq. (8.4.4) that 
Xn- 
Pr(fi > p+ ke) = Pr (2 > r) = Pr[U > k(n — 1)'/?). 


Since U has the t distribution with n — 1 degrees of freedom and n = 17, we must choose & such that 
Pr(U > 4k) = 0.95. It is found from a table of the ¢ distribution with 16 degrees of freedom that 
Pr(U < 1.746) = 0.95. Hence, by symmetry, Pr(U > —1.746) = 0.95. It now follows that 4k = —1.746 
and k = —0.4365. 


3. X, +X» has the normal distribution with mean 0 and variance 2. Therefore, Y = (X, + X2)/V/2 has 
a standard normal distribution. Also, Z = X} + X7+ X? has the y? distribution with 3 degrees of 
Y 
freedom, and Y and Z are independent. Therefore, U = Zaye has the ¢ distribution with 3 degrees 
of freedom. Thus, if we choose c = \/3/2, the given random variable-will be equal to U. 


A. Let y= 2/2. Then 


a0 dx l 725 a2\? 
Pe ee it) @ 
[. (2+ 22)2 144 [. i) 
—2 
1 1.25 y? 
—— 142) 2 
72 [. ( 7 4 


where g3(y) is the p.d.f. of the ¢ distribution with 3 degrees of freedom. It is found from the table of 
this distribution that the value of the integral is 0.85. Hence, the desired value is 


PC) (1) gay) = EEO _ vans 


72 144 
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2 ——— 
5. Let X2 = (X1 + X2)/2 and $3 = S°(X; — X2)?. Then 


i=l 


= (X1 + X»)? _ Ee 
(X1—X2)? 


It follows from Eq. (8.4.4) that U = /2X9/,/S3 has the t distribution with one degree of freedom. 
Since W = U?, we have 


Pr(W <4) = Pr(-2 < U < 2) = 2Pr(U < 2)—-1. 


It can be found from a table of the ¢ distribution with one degree of freedom that Pr(U < 2) is just 
slightly greater than 0.85. Hence, Pr(W < 4) = 0.70. 


6. The distribution of U = (20)!/?(Xa9 — y)/o’ is a t distribution with 19 degrees of freedom. Let v be 
the 0.95 quantile of this ¢ distribution, namely 1.729. Then 


0.95 = Pr(U < 1.729) = Pr(Xo9 < w+ 1.729/(20)1/? 0"). 


It follows that we want c = 1.729/(20)!/? = 0.3866. 


7. According to Theorem 5.7.4, 


(Qa)/2(m + 1/2)™ exp(—m — 1/2) 


a 

jou T(m + 1/2) 
1/2 m—1/2 _ 

fm 27 exp(=m) 


Taking the ratio of the above and dividing by m1/2, we get 


i T(m+1/2) i (Qn)'/?(m + 1/2)™ exp(—m — 1/2) 
ri-¥00 T(m)m'/2 m0 (2r)1/2(m) 172 exp(—m)ml/2 — 
= tim (SEE) ep-1/y 
= 4, 


where the last equality follows from Theorem 5.3.3 applied to (1+ 1/(2m))”. 


8. Let f be the p.d.f. of X and let g be the p.d.f. of Y. Define 


h(c) = Pr(-c < X <c)—Pr(-c << Y <c)= [ (f(z) — g(x) ]da. (S.8.2) 


=¢. 


Suppose that cp can be chosen so that f(a) > g(x) for all —co < x < co and f(x) < g(x) for all |x| > cp. 
It should now be clear that h(co) = max, h(c). To prove this, first let c > co. Then 
—co c 
h(c) = (eo) + [f (x) — g(x) |dx +/ [f (x) — g(a)\dz. 
co 


=6 
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Since f(x) — g(x) < 0 for all x in these last two integrals, h(c) < h(co). Similarly, if 0 < c < co, 


Cc 


h(c) = (co) — | 


co 
if @) — g(a)lae — [UR@) - gla)]ar. 
—co c 
Since f(x) — g(x) > 0 for all x in these last two integrals, h(c) < h(co). Finally, notice that the standard 
normal p.d.f. is greater than the t p.d.f. with five degrees of freedom for all —c < « < cif c= 1.63 and 
the normal p.d.f. is smaller that the t p.d.f. if |x| > 1.63. 


8.5 Confidence Intervals 


Commentary 


This section ends with an extended discussion of shortcomings of confidence intervals. The first paragraph 
on interpretation is fairly straightforward. Students at this level should be able to understand what the 
confidence statement is and is not saying. The long Example 8.5.11 illustrates how additional information that 
is available can be ignored in the confidence statement. Instructors should gauge the mathematical abilities 
of their students before discussing this example in detail. Although there is nothing more complicated than 
what has appeared earlier in the text, it does make use of multivariable calculus and some subtle reasoning. 

Many instructors will recognize the statistic Z in Example 8.5.11 as an ancillary. In many examples, con- 
ditioning on an ancillary is one way of making confidence levels (and significance levels) more representative 
of the amount of information available. The concept of ancillarity is beyond the scope of this text, and it 
is not pursued in the example. The example merely raises the issue that available information like Z is not 
necessarily taken into account in reporting a confidence coefficient. This makes the connection between the 
statistical meaning of confidence and the colloquial meaning more tenuous. 

If one is using the software R, remember that qnorm and qt compute quantiles of normal and ¢ distribu- 
tions. These quantilies are ubiquitous in the construction of confidence intervals. 


Solutions to Exercises 


1. We need to show that 


= =g [LaeYvy\) oe = ~sflty\ ¢ 
1 il _ 


By subtracting X,, from all three sides of the above inequalities and then dividing all three sides by 
o/n'/? > 0, we can rewrite the probability in (S.8.3) as 


a (147) 5 o— Xn (+2) 
Prl|b 1 (2 ee (SS 
| ( 5} )< ane < 


The random variable (2 — Xn)/(o/n'/?) has a standard normal distribution no matter what ps and 


o” are. And the probability that a standard normal random variable is between —6~1([1 + 7]/2) and 


O-*([1 + y]/2) is (L+4)/2—-[1- (1+7)/2] =7. 


n 1/2 
2. In this exercise, X, = 3.0625, o’ = pees == »| = 0.5125 and o!/n1/? = 0.1812. There- 


i=1 
fore, the shortest confidence interval for y will have the form 3.0625 — 0.1812c < wu < 3.0625 + 0.1812c. 
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If a confidence coefficient y is to be used, then c must satisfy the relation Pr(—c < U < c) = 7, where 
U has the ¢ distribution with n — 1 = 7 degrees of freedom. By symmetry, 


Pr(—ce < U <c) = Pr(U <c)—Pr(U < —c) = Pr(U < c) —[1—Pr(U <c)| =2Pr(U <c)—-1. 


As in the text, we find that c must be the (1+ )/2 quantile of the t distribution with 7 degrees of 
freedom. 


(a) Here y = 0.90, so (1 + y)/2 = 0.95. It is found from a table of the t distribution with 7 degrees of 
freedom that c = 1.895. Therefore, the confidence interval for 1 has endpoints 3.0625 — (0.1812) 
(1.895) = 2.719 and 3.0625 + (0.1812) (1.895) = 3.406. 


(b) Here y = 0.95, (1+ )/2 = 0.975, and c = 2.365. Therefore, the endpoints of the confidence 
interval for ys are 3.0625 — (0.1812)(2.365) = 2.634 and 3.0625 + (0.1813) (2.365) = 3.491. 

(c) Here y = 0.99, (1 + y)/2 = 0.995, and c = 3.499. Therefore, the endpoints of the interval are 2.428 
and 3.697. 


One obvious feature of this exercise, that should be emphasized, is that the larger the confidence 
coefficient , the wider the confidence interval must be. 


. The endpoints of the confidence interval are X;,— co! /n'/? and Xp, +co'/n'/?. Therefore, L = 20'/n1/? 
and L? = 4c?o’?/n. Since 


has the x? distribution with n — 1 degrees of freedom, E(W) = n—1. Therefore, E(o’2) = E(a?W/[n— 
1]) = 07. It follows that E(L?) = 4c?o7/n. As in the text, c must be the (1 + y)/2 quantile of the t 
distribution with n — 1 degrees of freedom. 


(a) Here, (1+ y)/2 = 0.975. Therefore, from a table of the ¢ distribution with n — 1 = 4 degrees of 
freedom it is found that c = 2.776. Hence, c? = 7.706 and E(L?) = 4(7.706)0?/5 = 6.1607. 

(b) For the t distribution with 9 degrees of freedom, c = 2.262. Hence, E(L?) = 2.0507. 

(c) Here, c = 2.045 and E(L?) = 0.5607. 


It should be noted from parts (a), (b), and (c) that for a fixed value of y, E(L?) decreases as the 
sample size n increases. 


(d) Here, y = 0.90, so (1+ y)/2 = 0.95. It is found that c = 1.895. Hence, E(L”) = 4(1.895)?0?7/8 = 
1.800. 


(e) Here, y = 0.95, so (1+ y)/2 = 0.975 and c = 2.365. Hence, E(L”) = 2.8007. 
(f) Here, y = 0.99, so (1+ y)/2 = 0.995 and c = 3.499. Hence, E(L”) = 6.1207. 


It should be noted from parts (d), (e), and (f) that for a fixed sample size n, E(L*) increases as 7 
increases. 


. Since \/n(X, — )/o has a standard normal distribution, Pr |—1.96 < 


This relation can be rewritten in the form 


— 1.96 _ 1.96 
Pr (Xn — vo <n <Xn+—") = 0.95. 
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Therefore, the interval with endpoints X, —1.960/./n and X,+1.960/./n will be a confidence interval 
for js with confidence coefficient 0.95. The length of this interval will be 3.920/,/n. It now follows that 
3.920 /./n < 0.010 if and only if \/n > 392. This means that n > 153664 of n = 153665 or more. 


nm 
. Since SUX — X,,)*/o? has a x? distribution with n — 1 degrees of freedom, it is possible to find 


i=l 
constants c; and cz which satisfy the relation given in the hint for this exercise. (As explained in this 
section, there are an infinite number of different pairs of values of c; and cz that might be used.) The 
relation given in the hint can be rewritten in the form 


i= = 1 = 
Pr |— So (Xj — Xn)? <0? < — 30 (Xi — X)?| =7. 
© i a 


n n 
Therefore, the interval with endpoints equal to the observed values of eee —Xn)?/co and + — 

i=1 i=1 
Xn)"/c1 will be a confidence interval for o? with confidence coefficient +. 


. The exponential distribution with mean js is the same as the gamma distribution with a = 1 and 
nm 


GB = 1/p. Therefore, by Theorem 5.7.7, ee will have the gamma distribution with parameters 
i=1 ; 
a=n and @ = 1/u. In turn, it follows from Exercise 1 of Sec. 5.7 that So Xi/u has the gamma 
i=1 , 
distribution with parameters a =n and 6 = 1. It follows from Definition 8.2.1 that 25° Xfi has 
i=1 
the y? distribution with 2n degrees of freedom. Constants cj and cy which satisfy the relation given 
in the hint for this exercise will then each be 1/2 times some quantile of the x? distribution with 2n 
degrees of freedom. There are an infinite number of pairs of values of such quantiles, one corresponding 
to each pair of numbers q, > 0 and q > 0 such that q2 — q; = y. For example, with q; = (1 — y)/2 
and gz = (1+ y)/2 we can let c; be 1/2 times the q; quantile of the y? distribution with 2n degrees of 
freedom for i = 1,2. It now follows that 


depts La 
Pr —SOXi << —S)X; =7. 
C2 i=1 C11 


nm n 

Therefore, the interval with endpoints equal to the observed values of > X;/c2 and S- X;/c, will be 
i=l i=l 

a confidence interval for 4 with confidence coefficient 7y. 


. The average of the n = 20 values is , = 156.85, and o’ = 22.64. The appropriate ¢ distribution quantile 


is Tyg‘ (0.95) = 1.729. The endpoints of the confidence interval are then 156.85 + 22.64 x 1.729/20!/2, 
Completing the calculation, we get the interval (148.1, 165.6). 


. According to (8.5.15), Pr(|X2 — 6| < 0.3/Z = 0.9) = 1, because 0.3 > (1 —0.9)/2. Since Z = 0.9, we 


know that the interval between X, and X2 covers a length of 0.9 in the interval [9 —1/2,@0+1/2]. Hence 
X» has to lie between 


G-12eO=1/249)- 9 ape gag ORUeTOIte41/2 


= 6+ 0.05. 
2 2 = 


Hence X» must be within 0.05 of 6, hence well within 0.3 of @. 
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The interval between the smaller an larger values is (4.7, 5.3). 


The values of 6 consistent with the observed data are those between 5.3—0.5 = 4.8 and 4.7+0.5 = 
5.2. 


The interval in part (a) contains the set of all possible 6 values, hence it is larger than the set of 
possible @ values. 


The value of Z is 5.3 — 4.7 = 0.6. 
According to (8.5.15), 


_ 2x 0.1 
Pr(|Xy — 6] < 0.1/Z = 0.6) = -~ 


The likelihood function is 


1 if48<6< 5.2, 
0 otherwise. 


f (x6) = 


(See the solution to Exercise 9(b) to see how the numbers 4.8 and 5.2 arise.) The posterior p.d_f. 
of @ is proportional to this likelihood times the prior p.d.f., hence the posterior p.d.f. is 


cexp(—0.10) if48<6< 5.2, 
0 otherwise, 


where c is a constant that makes this function into a p.d.f. The constant must satisfy 


5.2 
| exp(—0.10)d0 = 1. 
4.8 


Since the integral above equals 10|exp(—0.48)—exp(—0.52)] = 0.2426, we must have c = 1/0.2426 = 
4.122. 
The observed value of X9 is Z = 5. So, the posterior probability that |@ — Z| < 0.1 is 


5.1 
4.122 exp(—0.10)d6 = 41.22/exp(—0.49) — exp(—0.51)] = 0.5. 
4.9 


Since the interval in part (a) of Exercise 9 contains the entire set of possible 6 values, the posterior 
probability that @ lies in that interval is 1. 


The posterior p.d.f. of 6 is almost constant over the interval (4.8,5.2), hence the c.d.f. will be 


almost linear. The function in (8.5.15) is also linear. Indeed, for c < 0.2, the posterior probability 
of |@ — 5| < c equals 


5+¢e 
| 4.122 exp(—0.10)d0 41.22 exp(—0.5)[exp(0.1c) — exp(—0.1c)] 
5 


—C 


25 x 2 x O.le = 5c. 


2 


Since z = 0.6 in this example, 5c = 2c/(1 — z), the same as (8.5.15). 


11. The variance stabilizing transformation is a(a) = arcsin(#!/?), and the approximate distribution of 
a(X,,) is the normal distribution with mean a(p) and variance 1/n. So, 


Pr (aresin(Xy/”) — ®°1((1 + 9]/2)n-"/? < aresin p'/? < aresin(Xy/) + ®71((1 + 9]/2)n-"/?) w 9. 


This would make the interval with endpoints 


arcsin(z}/?) + @-1((1 + y]/2)n7/? (S.8.4) 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


254 Chapter 8. Sampling Distributions of Estimators 
an approximate coefficient y confidence interval for arcsin(p'/?). The transformation a(x) has an inverse 
a—!(y) =sin?(y) for 0 < y < 7/2. If the endpoints in (S.8.4) are between 0 and 7/2, then the interval 


with endpoints 


sin? (arcsin(z}/”) +671((1+ /2)n-¥/?) (S.8.5) 


will be an approximate coefficient y confidence interval for p. If the lower endpoint in (S.8.4) is negative 
replace the lower endpoint in (S.8.5) by 0. If the upper endpoint in (S.8.4) is greater than 7/2, replace 
the upper endpoint in (S.8.5) by 1. With these modifications, the interval with the endpoints in (S.8.5) 
becomes an approximate coefficient y confidence interval for p. 


12. For this part of the proof, we define 


A 


r (G72), X) ) 
B= r(G(y1),X). 


If r(v, x) is strictly decreasing in v for each x, we have 

V(X,0) < cif and only if g(@) > r(c, X). (S.8.6) 
Let c = G~!(7;) in Eq. (S.8.6) for each of i = 1,2 to obtain 

Pr(g(@) > B) =m, Pr(g(@) > A) = 72. (S.8.7) 
Because V has a continuous distribution and r is strictly decreasing, 

Pr(A = g(8)) = Pr(V(X,0) = G"(q2)) = 0, 


and similarly Pr(B = g(@)) = 0. The two equations in (8.8.7) combine to give Pr(A < g(0) < B) =7. 


8.6 Bayesian Analysis of Samples from a Normal Distribution 


Commentary 


Obviously, this section should only be covered by those who are treating Bayesian topics. One might find it 
useful to discuss the interpretation of the prior hyperparameters in terms of amount of information and prior 
estimates. In this sense Ay and 2ag represent amounts of prior information about the mean and variance 
respectively, while zg and 89/ao are prior estimates of the and variance respectively. The corresponding 
posterior estimates are then weighted averages of the prior estimates and data-based estimates with weights 
equal to the amounts of information. The posterior estimate of variance, namely {)/a; is the weighted 
average of Go/ao (with weight 2a9), 0? (with weight n— 1), and nAg(En — Ho)?/(Ao + 2) (with weight 1). 
This last term results from the fact that the prior distribution of the mean depends on variance (precision), 
hence how far %,, is from po tells us something about the variance also. 

If one is using the software R, the functions qt and pt respectively compute the quantile function and 
c.d.f. of a t distribution. These functions can replace the use of tables for some of the calculations done in 
this section and in the exercises. 
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Solutions to Exercises 


1 


Since X has the normal distribution with mean jz and variance 1/7, we know that Y has the normal 
distribution with mean aj + b and variance a?/r. Therefore, the precision of Y is t/a?. 


. This exercise is merely a restatement of Theorem 7.3.3 with 6 replaced by 1, 0? replaced by 1/7, 


replaced by jug, and v? replaced by 1/Ag. The precision of the posterior distribution is the reciprocal 
of the variance of the posterior distribution given in that theorem. 


. The joint p.d-f. f,(a|r) of X1,...,Xn is given shortly after Definition 8.6.1 in the text, and the prior 


p.d.f. €(7) is proportional to the expression £2(7) in the proof of Theorem 8.6.1. Therefore, the posterior 
p.d.f. of 7 satisfies the following relation: 


E(r |) ofall T)E(7) 0 7? exp { [-5 Dole — 0 r} 10- exp(—o) 


= 7ot(n/2)-1 oxy {- 


It can now be seen that this posterior p.d.f. is, except for a constant factor, the p.d.f. of the gamma 
distribution specified in the exercise. 


. The posterior distribution of 7, after using the usual improper prior, is the gamma distribution with 


parameters (n—1)/2 and s2/2. Now, V is a constant (n—1)o times 7, so V has the gamma distribution 
with parameters (n — 1)/2 and (s2/2)/[(n — 1)o'?] = 1/2. This last gamma distribution is also known 
as the x? distribution with n — 1 degrees of freedom. 


. Since E(r) = ag/8o = 1/2 and Var(r) = ao/6? = 1/3, then ap = 2 and Bp = 4. Also, wo = E(u) = —5. 


Finally, Var(1) = Bo/[Ao(ao — 1)] = 1. Therefore, »9 = 4. 


. Since E(t) = ao/8o = 1/2 and Var(r) = ao/6? = 1/4, then ap = 1 and fo = 2. But Var(y) is finite 


only if a > —1. 


. Since E(r) = ag/8p = 1 and Var(r) = ao/62 = 4, then ap = Bo = 1/4. But E(u) exists only if 


ao > 1/2. 


. It follows from Theorem 8.6.2 that the random variable U = (yw — 4)/4 has the ¢ distribution with 


2a9 = 2 degrees of freedom. 
(a) Pr(u > 0) = Pr(Y > —1) = Pr{Y < 1) = 0.79. 
(b) 


Pr(0.736 < ps < 15.680) Pr(—0.816 < Y < 2.920) 
= Pr(Y < 2.920) — Pr(Y < —0.816) 
Pr(Y < 2.920) — [1 — Pr(Y < 0.816)] 


= 0.95 — (1 — 0.75) = 0.70. 


(a) The posterior hyperparameters are computed in the example. The degrees of freedom are 2a; = 22, 
so the quantile from the ¢ distribution is T5'({1 + .9]/2) = 1.717, and the interval is 


By yo (earl 
+ 1.717 = 183.95+1.717( =) = (157.83, 210.07). 
is (4 20 x 11 ( pee 
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(b) This interval has endpoints 182.17 + (88678.5/[17 x 18])!/?7;21 (0.95). With T;7'(0.95) = 1.740, we 
get the interval (152.55, 211.79). 


Since E(T) = ag/fo = 2 and 


Var(r) = B= E(r?) - [B(r)? = 1, 


then a9 = 4 and fp = 2. Also wo = E(u) = 0. Therefore, by Eq. (14), Y = (20) 2 has a t distribution 
with 2a9 = 8 degrees of freedom. It is found from a table of the t distribution that Pr(|Y| < 0.706) = 0.5. 


0.706 
Therefore, Pr ( | < oar) = 0.5. It now follows from the condition given in the exercise that 
0 
0.706 
(Qr) 72 = 1.412. Hence, Ao = 1/8. 


It follows from Theorem 8.6.1 that uw, = 80/81, A1 = 81/8, a, = 9, and 3, = 491/81. Therefore, if Eq. 
(8.6.9) is applied to this posterior distribution, it is seen that the random variable U = (3.877)(u—0.988) 
has the t distribution with 18 degrees of freedom. Therefore, it is found from a table Pr(—2.101 < Y < 
2.101) = 0.95. An equivalent statement is Pr(0.446 < yw < 1.530) = 0.95. This interval will be the 
shortest one having probability 0.95 because the center of the interval is 41, the point where the p.d-f. 
of w is a maximum. Since the p.d.f. of w decreases as we move away from jy in either direction, it 
follows that an interval having given length will have the maximum probability when it is centered at 


M1. 


Since E(t) = ap/Go = 1 and Var(r) = ao/6?% = 1/3, it follows that ag = Bo = 3. Also, since 
the distribution of 4 is symmetric with respect to jo and we are given that Pr(yw > 3) = 0.5, then 
Ho = 3. Now, by Theorem 8.6.2, U = Ny? (u — 3) has the ¢ distribution with 2a9 = 6 degrees of 


freedom. It is found from a table that Pr(Y < 1.440) = 0.90. Therefore, Pr(Y > —1.440) = 0.90 and 


1.44 
it follows that Pr (: a ee > = 0.90. It now follows from the condition given in the exercise that 
Xo 
1.440 
3 — py = 0.12. Hence, Ap = 1/4. 
yy 


It follows from Theorem 8.6.1 that 1, = 67/33, 1 = 33/4, ay = 7, and 6; = 367/33. In calculating 
the value of 8,;, we have used the relation 


nm n 


pC? =—-= bes — nz. 


i=l i=l 


If Theorem 8.6.2 is now applied to this posterior distribution, it is seen that the random variable 
U = (2.279)(u — 2.030) has the t distribution with 14 degrees of freedom. Therefore, it is found from a 
table that Pr(—2.977 < Y < 2.977) = 0.99. An equivalent statement is Pr(0.724 < u < 3.336) = 0.99. 


The interval should run between the values py + (81/[A104])!/?T. se (0.95). The values we need are 
available from the example or the table of the ¢ distribution: py = 1.345, 6, = 1.0484, A, = 11, 
a, = 5.5, and T,;'(0.95) = 1.796. The resulting interval is (1.109, 1.581). This interval is a bit wider 
than the confidence interval in Example 8.5.4. This is due mostly to the fact that (3; /a1)!/ is somewhat 
larger than o’. 
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15. (a) The posterior hyperparameters are 
2x 3.5411 x 7.2 


Pi Sag 
Ay = 24+11=13, 
11 
a = 2+>=75, 
il 2x11 
ec al Op, .2— 3.5)? |) = 22.73. 
By 5 (20.3 + (7 3.5)°) 73 


(b) The interval should run between the values jy + (B1/[Arei])/?T5Q, (0.975). From the table of the 
t distribution in the book, we obtain Ty;'(0.975) = 2.131. The interval is then (5.601, 7.659). 


16. (a) The average of all 30 observations is 739 = 1.442 and s3, = 2.671. Using the prior from Exam- 
ple 8.6.2, we obtain 


1x 1+ 30 x 1.442 


= ——_—_— = 1.428, 
Ce 1+ 30 
Ay = 14+30=31, 
30 
ay = 0.54 5 15.5, 
1 1 x 30 
= 05+ = (2.671 1.442 — 1)? ) = 1.930. 
i 2 ( + T¥30! ) ) 
The posterior distribution of w and 7 is a joint normal-gamma distribution with the above hyper- 


parameters. 
(b) The average of the 20 new observations is T29 — 1.474 and s3, = 1.645. Using the posterior in 
Example 8.6.2 as the prior, we obtain the hyperparameters 
11 x 1.345 + 20 x 1.474 


= oo = 1,428 
es 11 +20 
4 = 114+20=31, 
20 
1 11 x 20 
= 1.04844 = ( 1.645 + ———(1.474 — 1.345)? ) = 1.930. 
By 048 +3 ( Sti = a0 7 345)?) 930 


The posterior hyperparameters are the same as those found in part (a). Indeed, one can prove 
that they must be the same when one updates sequentially or all at once. 


17. Using just the first ten observations, we have 7, = 1.379 and s? = 0.9663. This makes pu; = 1.379, 
Ay = 10, ay = 4.5, and 6; = 0.4831. The posterior distribution of 4 and 7 is the normal-gamma 
distribution with these hyperparameters 


18. Now, we use the hyperparameters found in Exercise 18 as prior hyperparameters and combine these 
with the last 20 observations. The average of the 20 new observations is F29 — 1.474 and s3) = 1.645. 
We then obtain 


10 x 1.379 + 20 x 1.474 


Sf eee ee a 
es 10 + 20 
dy = 10490 —20, 

20 
a = 4545 =145, 


10 x 20 
10 + 20 


@ 
f 
I 


1 
0.48315 (1.615 + (1.474 — 1.379)) = 1.336. 
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Comparing two sets of hyperparameters is not as informative as comparing inferences. For example, a 
posterior probability interval will be centered at j1, and have half-width proportional to (6,/[a1\1])/?. 
Since 1 is nearly the same in this case and in Exercise 16 part (b), the two intervals will be centered 
in about the same place. The values of (8; /[a1\;])!/? for this exercise and for Exercise 16 part (b) are 
respectively 0.05542 and 0.06338. So we expect the intervals to be slightly shorter in this exercise than 
in Exercise 16. (However, the quantiles of the ¢ distribution with 31 degrees of freedom are a bit larger 
in this exercise than the quantiles of the ¢ distribution with 31 degrees of freedom in Exercise 16.) 


19. (a) For the 20 observations given in Exercise 7 of Sec. 8.5, the data summaries are %,, = 156.85 and 
s? = 9740.55. So, the posterior hyperparameters are 
0.5 x 150 + 20 x 156.85 


yo = = 156.68, 
0.5 + 20 
Mi = 054+20= 205. 
20 
am = 1+5=0, 
1 0.5 x 20 
— 44 (9740.55 + ~~~ (156.85 — 150)? | — 4885.7. 
Pr +5 ( + 0540! )*) 


The joint posterior of 4 and 7 is the normal-gamma distribution with these hyperparameters. 
(b) The interval we want has endpoints yy + (81 /[o1A1))/? Teen (0.95). The quantile we want is 


Ts (0.05) = 1.717. Substituting the posterior hyperparameters gives the endpoints to be a = 
148.69 and b = 164.7. 


20. The data summaries in Example 7.3.10 are n = 20, T29 = 0.125. Combine these with s3, = 2102.9 to 
get the posterior hyperparameters: 


1x 0+ 20 x 0.125 


= 22 S00 
va 1+ 20 
Ay = 14+20=21, 
20 
= 1 =i, 
2102.9 20x1 x (0.125 — 0)? 
= 60 TS: 
Br ar: 2(1 + 20) 


(a) The posterior distribution of (41,7) is the normal-gamma distribution with the posterior hyperpa- 
rameters given above. 


(b) The posterior distribution of 
(= x 11 
1111.5 

is the ¢ distribution with 22 degrees of freedom. So, 
Pr(u > 1x) = Pr[0.4559(ju — 0.1190) > 0.4559(1 — 0.1190)] = Pr(T > 0.4016) = 0.3459, 


where the final probability can be found by using statistical software or interpolating in the table 
of the ¢ distribution. 


1/2 
) (11 — 0.1190) = 0.4559(u — 0.1190) = T 


8.7 Unbiased Estimators 


Commentary 


The subsection on limitations of unbiased estimators at the end of this section should be used selectively by 
instructors after gauging the ability of their students to understand examples with nonstandard structure. 
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Solutions to Exercises 


1. (a) The variance of a Poisson random variable with mean @ is also 6. So the variance is 0? = g(0) = 0. 


(b) The M.L.E. of g(9) = @ was found in Exercise 5 of Sec. 7.5, and it equals X,,. The mean of X,, is 
the same as the mean of each X;, namely 0, hence the M.L.E. is unbiased. 


2. Let E(X*) = B,. Then 
E Ly> xh a) aha =p 
ran a no a A k k- 


i i 
3. By Exercise 2, 6; = — ) X? is an unbiased estimator of E(X?). Also, we know that 6) = a1 y (Xi- 
n n— 
i=l i=1 
X,,)” is an unbiased estimator of Var(X). Therefore, it follows from the hint for this exercise that 6, — do 


will be an unbiased estimator of [E(X)]?. 


4. If X has the geometric distribution with parameter p, then it follows from Eq. (5.5.7) that E(X) = 
(1—p)/p=1/p—1. Therefore, E(X +1) =1/p, which implies that X + 1 is an unbiased estimator of 


1/p. 


5. We shall follow the hint for this exercise. If E[6(X)] = exp(A), then 


ts . 6(z) exp(—A)A* 
exp(d) = BU(X)] = > ola) fle | A) = Yo WPCA 
z=0 «=0 . 
Therefore, 
O(a) A (QAP Pe 
2 zx! =A PS a 2 g! 
z=0 «z=0 «z=0 
Since two power series in A can be equal only if the coefficients of A” are equal for « = 0,1,2,..., if 
follows that 6(x) = 2” for x =0,1,2,.... This argument also shows that this estimator 6(X) is the 


unique unbiased estimator of exp(A) in this problem. 


6. The M.S.E. of 63 is given by Eq. (8.7.8) with c = 1/n and it is, therefore, equal to (2n —1)04/n?. The 
M.S.E. of 6? is given by Eq. (8.7.8) with c = 1/(n — 1) and it is, therefore, equal to 204/(n — 1). Since 
(2n — 1)/n? < 2/(n —1) for every positive integer n, it follows that the M.S.E. of 62 is smaller than 
the M.S.E. of 6? for all values of yz and o?. 


7. For any possible values x1,...,%p of X1,...,Xn, let y = 0_, 2. Then 


BO Xtj.+ +5 Xy)| = » Piss: ay lp Lap) 


where the summation extends over all possible values of 71,...,2,,. Since 
p¥(1—p)”-¥ is a polynomial in p of degree n, it follows that E[5(X1,...,Xp)] is the sum of a finite 
number of terms, each of which is equal to a constant 6(#1,...,2,,) times a polynomial in p of degree 


n. Therefore, E[5(X1,...,Xn)] must itself be a polynomial in p of degree n or less. The degree would 
actually be less than n if the sum of the terms of order p” is 0. 
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8. If E[d(X)] = p, then 
p = E[d(X)] = > 7 (xp. — p)”. 
xz=0 


Therefore, }°72.9 6(x)(1 — p)” = 1. Since this relation must be satisfied for all values of 1 — p, it follows 
that the constant term (0) in the power series must be equal to 1, and the coefficient 6(x) of (1 — p)* 
must be equal to 0 for x = 1,2,.... 


9. If E[6(X)] = exp(—2A), then 


car x!) 

Therefore, 
Ola Am  (-1)7 A? 
2g Oa 


Therefore, 6(X) = (—1)* or, in other world, 6(X) = 1 if x is even and 6(x) = —1 if x is odd. 


10. Let X denote the number of failures that are obtained before k successes have been obtained. Then X 
has the negative binomial distribution with parameters k and p, and N = X +k. Therefore, by Eq. 
(5.5.1), 


=o oe ae 2 oe ee 
E (y=) -® (eae) 7 yl x Jka» 


° (x +k—2)! * 
- dX ae p*(1—p) 


° fx+k—-2 _ 7 
= »d-( is Jes ‘(1 —p)*. 
x=0 


But the final summation is the sum of the probabilities for a negative binomial distribution with 
parameters k — 1 and p. Therefore, the value of this summation is 1, and E((k — 1]/[N —1]) =p. 


11. (a) E(6) =aE(Xm)+(1—a)E(¥n) = a0 + (1—a)0 = 9. Hence, 6 is an unbiased estimator of 6 for 
all values of a, m and n. 


(b) Since the two samples are taken independently, X;, and Y,, are independent. Hence, 


Var(e) = 07 Var) + =o) Var) So" (2) +(1—a)? (2). 


m 
Since 04 = 40%, it follows that 
A Ao? (1-a)?|_ » 
6) = | — + —— ‘ 
Var (6) oo oR 


By differentiating the coefficient of o2,, it is found that Var(@) is a minimum when a = m/(m+4n). 
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(a) Let X denote the value of the characteristic for a person chosen at random from the total popu- 


we 


YS 


lation, and let A; denote the event that the person belongs to stratum i (¢ = 1,...,k). 
Then 
k 


k 
ju = E(X) = >> E(X | Ai) Pr( Ai) = 0 papi. 
i=l i=l 


Also, 


k k 
E(t) = [pi E (Xi) = Do pis =v 
i=1 i=l 


Since the samples are taken independently of each other, the variables X1,...,X} are independent. 
Therefore, 
k = k peor 
Var(ji) = S- p Var(X;) = > ss 
i=1 =i 
oe * (vii)? , 
Hence, the values of n1,...,n% must be chosen to minimize v = ‘ , subject to the con- 
| i 
k k-1 : 
straint that Xe n, =n. If we let nz =n — a n;, then 
i=1 i=1 
Ov  —(p;0i)? oR)? 
= i 4 Pre)” for i=1,...,k—-—1. 


= D) 
On; ng nz, 


When each of these partial derivatives is set equal to 0, it is found that n;/(p;o;) has the same value 
k 


k 
fori =1,...,k. Therefore, n; = cp;o; for some constant c. It follows that n = S- ny = c)> pjo;. 
j=l j=l 
k 
Hence, c = n/ ye; and, in turn, 
j=l 
we NPiPi 
ae k $ 
rs, 
j=l 
This analysis ignores the fact that the values of n1,...,n, must be integers. The integers n1,...,% 
for which v is a minimum would presumably be near the minimizing values of n1,...,n,% which 


have just been found. 


By Theorem 4.7.1, 
E(d) = E[E(6|T)] = E (0). 


Therefore, 6 and 69 have the same expectation. Since 6 is unbiased, E(d) = 6. Hence, E (69) = 0 
also. In other words, 6p is also unbiased. 


Let Y = 6(X) and X =T in Theorem 4.7.4. The result there implies that 
Varg(6(X)) = Varg(do(X)) + Eg Var(o(X)|T). 
Since Var(d(X)|T) > 0, so too is Eg Var(6(X)|T), so Varg(d(X)) > Vare(do(X)). 
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14. For 0< y < 0, the c.d.f. of Y, is 
F(y| 6) =Pr(Y <y| 0) =Pr(X1 <y,..., Xn <y|9= (4). 
Therefore, for 0 < y < 6, the p.d-f. of Yy, is 


fly | 8) = +-Flu|8) = 


It now follows that 


A nant n 
Bo(¥a) = fy dy =" 


Hence, Eg([n + 1]Y,/n) = 0, which means that (n+ 1)Y,,/n is an unbiased estimator of 6. 
15. (a) f1|A+f2|=08(0+0-98) =86, 
f(4|0)+ f6|0)=(1—-0)7[6+ (1—-8)] = (1-8), 
f(3| 6) = 20(1 — @). 
The sum of the five probabilities on the left sides of these equations is equal to the sum the right 
sides, which is 


ee a ale 


(b) E 6c(x) f(x | 0) = 1-0? + (2 — 2c)67(1 — 6) + (c)20(1 — 8) + (1 — 2c)O(1 — 0)? + 
= 


It will be inal that the sum of the coefficients of 6° is 0, the sum of the coefficients of 6? is 0, 
the sum of the coefficients of @ is 1, and the constant term is 0. Hence, Eg[d-(X)] = 6. 


(c) For every value of c, 
Vargo (dc) = Eg, (52) _ [Eo (5)]? = E46, (2) =@. 


Hence, the value of ¢ for which Varg,(6-) is a minimum will be the value of c for which Eg, (52) is 
a minimum. Now 


Eo)(52) = (1)?69 + (2 — 2c)?0G(1 — Ao) + (c)?24o(1 — 4) 
+(1 — 2c)?@9(1 — 0)? +0 
= Qc? (262 (1 _— 00) + Oo(1 a 60) + 209(1 a 60)"] 
—4c[202(1 — 89) + 00(1 — 4)?] + terms not involving c. 
After further simplification of the coefficients of c? and c, we obtain the relation 
E,(62) = 609(1 — 0)c? + 460(1 — 62)e+ terms not involving c. 
By differentiating with respect to c and setting the derivative equal to 0, it is found that the value 
of c for which Eg,(62) is a minimum is c = (1 + 09)/3. 


16. The unbiased estimator in Exercise 3 is 


For the observed values X; = 2 and X29 = —1, we obtain the value —2 for the estimate. This is 
unacceptable. Because [E(X)]? > 0, we should demand an estimate that is also nonnegative. 
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8.8 Fisher Information 


Commentary 


Although this section is optional, it does contain the interesting theoretical result on asymptotic normality 
of maximum likelihood estimators. It also contains the Cramér-Rao inequality, which can be useful for 
finding minimum variance unbiased estimators. However, the material is really only suitable for a fairly 
mathematically oriented course. 


Solutions to Exercises 


ie 
f(iz|y) = sr {sa -n)}, 
i. 1 _ 
fe LL) =. Jono — xp {syle ~ u)*} = 5) Fiz | #); 
_ ,)2 
fMeln) = [e <1) a fel 
Therefore, 
[fel de = [wile | way = SEX - 1) = 0, 
and 
Lag oe o4 a2 ot og? 
2. The p.f. is 


fiz\p) =p —p)’,. for @ =O a: 


The logarithm of this is log(p) + xlog(1 — p), and the derivative is 


According to Eq. (5.5.7) in the text, the variance of X is (1 — p)/p?, hence the Fisher information is 


_ 1 X | Var(x) | il 
nu E 7 ce ~ (l=p?? p(l—p)’ 
3. 
—0) 6" 
Fela) = SPCO 
M«|0) = —-0+2zlogé@—log(z!), 
N(a |) = 145, 
(a |) = 3: 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


264 Chapter 8. Sampling Distributions of Estimators 


Therefore, by Eq. (8.8.3), 


E(X) 1 
=— "(X | 0) = =-, 
1(8) = —Ba[X"(X | 0) = = = 5 
4. 
1 x 
f(x o) = Dag op {gat 
ae 
Mala) = —loga— 352 + const. 
Fi _ 1. # 
Ate|é) = = + = 
1 3a? 
rN" (x a) = ee) = “en 
Therefore, 
1 3E(X?) 1.3 2 
1(8) = -Bo[X"(X | 8) =- + =f =-G 4+ 5-5. 
5. Let v = 0%. Then 
ge 
a exp ¢ — , 
Kel) = sere |-F! 
a 
A(a |v) = -—=logvy —— + const., 
V 
1 x? 
/ —— _ 
ce Qv 2p?’ 
1 x 
" _ 
Therefore, 
1 il 1 
I(o*) = Tv) = -BN"(X |) =-s5 +S =55- 


6. Let g(x | w) denote the p.d.f. or the p.f. of X when y is regarded as the parameter. Then g(x | «) = 
f{x | »(u)]. Therefore, 


log g(x | 4) = log fla | b(u)] = Ale | Yu), 


and 
© tog o(e | u) =X | HOW Wo). 
Ou 


It now follows that 


2 
T(u) = z.\[= log g(X | 1) = [W'(w)PE (AEX | wD) = YP Dolo). 
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7. We know that E(X;,) =p and Var(Xn) = p(l—p)/n. It was shown in Example 8.8.2 that I(P) = 


10. 


11. 


1/[p(1 — p)]. Therefore, Var(X,,) is equal to the lower bound 1/|nI(p)] provided by the information 
inequality. 


. We know that E(X,) = and Var (Xp) = am It was shown in Example 8.8.3 that I(u) = 1/o?. 


Therefore, Var (X,,) is equal to the lower bound 1/[nJ()] provided by the information inequality. 


. We shall attack this exercise by trying to find an estimator of the form c|X| that is unbiased. One 


approach is as follows: We know that X?/o? has the y? distribution with one degree of freedom. 
Therefore, by Exercise 11 of Sec. 8.2, |X|/o has the y distribution with one degree of freedom, and it 
was shown in that exercise that 


2 (1) _ var) «2 


oO 


~ 11/2) Va 
Hence, E(|X|) = 0,/2/m. It follows that E(|X|/7/2) =o. Let 6 = |X|,/7/2. Then 
a 7 a" e 
= —E(|X|*) = =o". 
(62) = SH(|X/) = 50 
Hence, 


Var 6 = B(6?) — [B(6))? = 50? — 0? = (5 7 1) 0, 
Since 1/I(c) = 07/2, it follows that Var(d) > 1/I(c). 


Another unbiased estimator is 6)(X) = V2a X if X >0 and 6;(X) =0 if X <0. However, it can 
be shown, using advanced methods, that the estimator 6 found in this exercise is the only unbiased 
estimator of o that depends on X only through |X|. 


If m(o) = logo, then m’/(o) = 1/o and [m'(c)}? = 1/c?. Also, it was shown in Exercise 4 that 
I(c) = 2/o?. Therefore, if T is an unbiased estimator of log a, it follows from the relation (8.8.14) that 
a eee 


a n 22n 


Var(T) > 
If f(x | 0) = a(0)b(x) exp[c(A)d(x)], then 


A(x | 8) = log a(@) + log b(x) + c(6)d(x) 
and 


_ a) 


r(x | 0) a(8) 


+ (6)d(2). 


Therefore, 


V(X | 0) = S2N(X | 8) = no + (6) So d(X;). 
i=1 i=1 


If we choose 


ol —na'(@) 
u(0) = and v(0) = aoe (6)’ 
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then Eq. (8.8.14) will be satisfied with T = a(x d(X;). Hence, this statistic is an efficient estimator of 


i=1 
its expectation. 


Let 6 = co? denote the unknown variance. Then 


f(a | 0) = pe? |- ye - u)*}. 
2 


This p.d.f. f(x | 6) has the form of an exponential family, as given in Exercise 12, with d(a) = (x — 4)’. 


Therefore, T = SOX — 1)? will be an efficient estimator. Since E[(X; — w)?] =o? for i=1,...,n, 
i=1 

then E(T) =no?. Also, by Exercise 17 of Sec. 5.7, E[(X;—)*] = 30% for i=1,...,n. Therefore, 

Var[(X; — )"] = 30+ — o* = 20%, and it follows that Var(T) = 2no*. 


It should be emphasized that any linear function of T will also be an efficient estimator. In particular, 
T/n will be an efficient estimator of 0. 


The incorrect part of the argument is at the beginning, because the information inequality cannot be 
applied to the uniform distribution. For each different value of 0, there is a different set of values of x 
for which f(x | @) > 0. 


(ey 


fala). = a tees), 


D(a) 
Mala) = alogG—logI(a)+(a—1)log2 — Gz, 
N(x] a) = wo 
r — TFe@(a@-—- ‘aye 
wel) = TF 
Therefore 


The distribution of the M.L.E. of a will be approximately the normal distribution with mean a and 
variance 1/[nI(a)]. 

It should be noted that we have determined this distribution without actually determining the M.L.E. 
itself. 


We know that the M.L.E. of is f=, and, from Example 8.8.3, that I(j:) = 1/07. The posterior dis- 
tribution of will be approximately a normal distribution with mean fi and variance 1/[nI(ji)| = 0? /n. 


We know that the M.L.E. of p is p=Z, and, from Example 8.8.2, that I(p) = 1/|p(1 — p)|. The 
posterior distribution of p will be approximately a normal distribution with mean p and variance 


1/[nI(p)] a Zl — Zn) / Ts 
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17. The derivative of the log-likelihood with respect to p is 
; O n 
(zp) = = |log + a log(p) + (mm) log’. <p) | = — = = — 
Op x Pp 


The mean of \’(X|p) is clearly 0, so its variance is 


— Var(X) — on 
He) p(l—p)? p(l—p) 


18. The derivative of the log-likelihood with respect to p is 


0 -1 _—rp— 
N'(a|p) = ap es ( = + rlog(p) + xlog(1 7) ee 
D w Pp 


The mean of \’(X|p) is clearly 0, so its variance is 


_ p* Var(X) _ r 
Hp) = pe(l—p)? — p(1—p)’ 


8.9 Supplementary Exercises 


Solutions to Exercises 


1. According to Exercise 5 in Sec. 8.8, the Fisher information I(a”) based on a sample of size 1 is 1/[2c%]. 
According to the information inequality, the variance of an unbiased estimator of o? must be at least 


nm 
204/n. The variance of V = >» X?/n is Var(X?)/n. Since X?/o? has a x? distribution with 1 degree 
i=1 
of freedom, its variance is 2. Hence Var(X?) = 20% and Var(V) equals the lower bound from the 
information inequality. E(V) = E(X?) = 07, so V is unbiased. 


2. The t distribution with one degree of freedom is the Cauchy distribution. Therefore, by Exercise 18 of 
Sec. 5.6, we can represent the random variable X in the form X = U/V, where U and V are independent 
and each has a standard normal distribution. But 1/X can then be represented as 1/X = V/U. Since 
V/U is again the ratio of independent, standard normal variables, it follows that 1/X again has the 
Cauchy distribution. 


3. It is known from Exercise 18 of Sec. 5.6 that U/V has a Cauchy distribution, which is the t distribution 
with one degree of freedom. Next, since |V| = (V?)!/?, it follows from Definition 8.4.1 that U/|V| has 
the required ¢ distribution. Hence, by the previous exercise in this section, |V|/U will also have this t 
distribution. Since U and V are i.i.d., it now follows that |U|/V must have the same distribution as 
|V|/U. 


4. It is known from Exercise 5 of Sec. 8.3 that X; + X2q and X; — X92 are independent. Further, if we let 


1 1 
Y, = ——(X X: d Yo = —~(X, — Xa), 
1 Tia | 1+X2) an 2 Tia 1 2) 


then Y; and Y2 have standard normal distributions. It follows, therefore, from Exercise 18 of Sec. 5.6 
that Y;/Y2 has a Cauchy distribution, which is the same as the t distribution with one degree of freedom. 


But 
YY, X1+ Xo 


Yo Xy— Xq’ 
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so the desired result has been established. This result could also have been established by a direct 
calculation of the required p.d.f. 


. Since X; has the exponential distribution with parameter (6, it follows that 26 X; has the exponential 


distribution with parameter 1/2. But this exponential distribution is the x? distribution with 2 degrees 
of freedom. Therefore, the sum of the iid. random variables 26X; (i = 1,...,n) will have a x? 
distribution with 2n degrees of freedom. 


. Let 6, be the proportion of the n observations that lie in the set A. Since each observation has 


probability 9 of lying in A, the observations can be thought of as forming n Bernoulli trials, each with 


probability 6 of success. Hence, E(6,,) = 0 and Var(6,) = 0(1 — @)/n. 


. (a) E(aS% + BS?) = a(m — 1)o? + B(n — 1)20?. 


Hence, this estimator will be unbiased if a(m — 1) + 28(n —1) =1. 
(b) Since $3 and S%. are independent, 


Var (aS% + 8S%) = a? Var (S%) + B?var(S?) 
a? [2(m — 1)o*] + B?[2(n — 1) - 404] 
2o*[(m — 1a? + 4(n — 1) 67]. 


Therefore, we must minimize 
A=(m-—1)a? + 4(n — 1)6? 


subject to the constraint (m— 1)a+2(n—1)G =1. If we solve this constraint for 6 in terms of a, 
and make this substitution for 6 in A, we can then minimize A over all values of a. The result is 


1 
= ——— and, h = ————_.. 
a ae le ence, 3 cae) 


. Xn11—Xn has the normal distribution with mean 0 and variance (1+ 1/n)o?. Hence, the distribution 


of (n/[n + 1])'/?(Xp4i — Xn)/o is a standard normal distribution. Also, nT?/o? has an independent 
x? distribution with n — 1 degrees of freedom. Thus, the following ratio will have the t distribution 
with n — 1 degrees of freedom: 


a XI _ 
(—.) (Xn4i — Xn)/o nat Sg oe 
nT? u/2 7 ( 
le — i| 


It can now be seen that k = ([n — 1]/[n + 1])/?. 


. Under the given conditions, Y/(2c) has a standard normal distribution and $?/o? has an independent 


x? distribution with n—1 degrees of freedom. Thus, the following random variable will have a t 
distribution with n — 1 degrees of freedom: 


Y/(2c) Y/2 


= i 


{S2/[o2(n- 1]? @ 
where a = [S2/(n — 1)]/?. 
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10. As found in Exercise 3 of Sec. 8.5, the expected squared length of the confidence interval is E(L”) = 
4c?a?/n, where c is found from the table of the ¢ distribution with n —1 degrees of freedom in the back 
of the book under the .95 column (to give probability .90 between —c and c). We must compute the 
value of 4c?/n for various values of n and see when it is less than 1/2. For n = 23, it is found that 
C29 = 1.717 and the coefficient of o? in E(L?) is 4(1.717)?/23 = .512. For n = 24, c93 = 1.714 and the 
coefficient of a? is 4(1.714)?/24 = .490. Hence, n = 24 is the required value. 


11. Let c denote the .99 quantile of the t distribution with n — 1 degrees of freedom; i.e., Pr(U < c) = .99 
nl2(X, a i) 


if U has the specified t¢ distribution. Therefore, Pr ; < J = .99 or, equivalently, 
a 


/ 


Pr|u> Xn - ual 


a = .99. Hence, L =X, — co’ /ni/?. 


12. Let c denote the .01 quantile of the y? distribution with n — 1 degrees of freedom; i.e., Pr(V < ce) = .01 
if V has the specified y? distribution. Therefore, 


S2 
Pr (3 = :) = .99 
Oo 


or, equivalently, 
Pr(g” <.57/c) = 99. 


Hence, U = S?/c. 


13. (a) The posterior distribution of @ is the normal distribution with mean j1, and variance v?, as given 


by (7.3.1) and (7.3.2). Therefore, under this distribution, 
Pr(iy — 1.961. < 6 < py + 1.961) = .95. 


This interval J is the shortest one that has the required probability because it is symmetrically 
placed around the mean ju; of the normal distribution. 


(b) It follows from (7.3.1) that 4; 4 Z, as v? > oo and from (7.3.2) that v? > 0?/n. Hence, the 
interval J converges to the interval 


1.960 _ 1.960 
Ca i? <@< n+ ape: 
It was shown in Exercise 4 of Sec. 8.5 that this interval is a confidence interval for 6 with confidence 
coefficient .95. 


14. (a) Since Y has a Poisson distribution with mean nJ@, it follows that 
 exp(—cy) exp(—n6)(nd)Y < (né exp(—c))¥ 
E(exp(—cY)) = >» Solna) Sn hind + Jind) = exp(—né) S- knviexp=e)! < ) 
y=0 . y=0 . 


exp(—n0) exp[né exp(—c)] = exp(n6[exp(—c) — 1)). 
Since this expectation must be exp(—8), it follows that n(exp(—c) — 1) = — 1 or c = log[n/(n—1)]. 
(b) It was shown in Exercise 3 of Sec. 8.8 that J(@) = 1/0 in this problem. Since m(@) = exp(—@), 
[m’(0)]? = exp(—20). Hence, from Eq. (8.8.14), 
6 exp(—26) 


Var(exp(—cY )) > —— 
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15. In the notation of Sec. 8.8, 


16. 


17. 


18. 


19. 


20. 


A«|0) = logd+(0—1)logz, 
1 
N(z|é) = a + log x, 
N"(e|@) = =1/8?. 


Hence, by Eq. (8.8.3), [(@) = 1/6? and it follows that the asymptotic distribution of 


f(z|@) = 0 *exp(-z/6), 
Ax|0) = —logd—-2/8, 
1 x 
N (a 0) = eT pe’ 
1 22 
N" (a 9) = 92 93° 
Therefore, 
I(@) = —E,[A"(X18)| = : 
(0) = ~Ep|X"(X10)] = ay. 


If m(p) = (1—p)?, then m’(p) = —2(1 — p) and [m’(p)]? = 4(1 — p)?. It was shown in Example 8.8.2 
that I(p) = 1/[p(1 — p)]. Therefore, if Tis an unbiased estimator of m(p), it follows from the relation 
(8.8.14) that 


var(T) > 45 p) p(l—p) _ 4p —p)* 


n n 
f(x|B) = Bexp(—S8x). This p.d.f. has the form of an exponential family, as given in Exercise 11 of 
Sec. 8.8, with d(z) = x. Therefore, T = > X; will be an efficient estimator. We know that E(X;) = 1/8 
i=1 
and Var (X;) = 1/87. Hence, E(T) =n/@ and Var(T) = n/A?. 


Since any linear function of T will also be an efficient estimator, it follows that X,, = T/n will be an 
efficient estimator of 1/3. As a check of this result, it can be verified directly that Var(X,,) = 1/[n8?] = 
[m’(8)]?/[nI(B)], where m(8) = 1/8 and I(8) was obtained in Example 8.8.6. 


It was shown in Example 8.8.6 that J(@) = 1/8”. The distribution of the M.L.E. of 8 will be approxi- 
mately the normal distribution with mean (6 and variance 1/[nJ(()]. 


(a) Let a(8) =1/8. Then a/(6) = —1/8?. By Exercise (19, it is known that Bn is approximately 
normal with mean { and variance 8?/n. Therefore, 1/6, will be approximately normal with mean 
1/8 and variance [a’(3)]?(8?/n) = 1/(nB?). Equivalently, the asymptotic distribution of 


(n6*)"/2(1/6, — 1/8) 


is standard normal. 
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(b) Since the mean of the exponential distribution is 1/6 and the variance is 1/ 67, it follows directly 
from the central limit theorem that the asymptotic distribution of X;, = 1/Bp is exactly that found 
in part (a). 


21. (a) The distribution of Y is the Poisson distribution with mean n@. In order for r(Y) to be an unbiased 
estimator of 1/6, we need 


= By(r(¥)) = Yo rly) exp(—no) 
y=0 y: 


This equation can be rewritten as 


SS r(y)nY ny 
exp(n@) = S- —— : (S.8.8) 
yo 
The function on the left side of (S.8.8) has a unique power series representation, hence the right 
side of (S.8.8) must equal that power series. However, the limit as 9 — 0 of the left side of (S.8.8) 
is 1, while the limit of the right side is 0, hence the power series on the right cannot represent the 
function on the left. 


(b) E(n/[Y +1]) = S¢ nexp(—nd) [nO]¥/(y +1)!. By letting u = y+1 in this sum, we get n[1 — 
y=0 
exp(—n6)|/[n0] = 1/0 — exp(—n@)/6. So the bias is exp(—n@)/6. Clearly exp(—n0) goes to 0 as 
n> oo. 


(c) n/A+Y) =1/(Xn+1/n). We know that X,, +1/n has approximately the normal distribution 
with mean 6+ 1/n and variance 0/n. We can ignore the 1/n added to @ in the mean since this 
will eventually be small relative to 0. Using the delta method, we find that 1/(X» + 1/n) has 
approximately the normal distribution with mean 1/0 and variance (1/67)?0/n = (n@3)~1. 


22. (a) The p.d.f. of Y, is 


n—-l/pn ; 
| ee fe TOS oes 8, 
Flyl®) = 0) otherwise. 


This can be found using the method of Example 3.9.6. If X = Y,,/0, then the p.d.f. of X is 


nel if 0 < x < 1, 
g(2|0) = f(x0|0)0 = 0 otherwise. 


Notice that this does not depend on @. The c.d.f. is then G(x) = x” for 0 < x < 1. The quantile 
function is G~1(p) = p\/”. 
(b) The bias of Y,, as an estimator of @ is 


é ny”! 0 
Ee(Yn) -8= dy — 9 = -——.. 
(Ya) 0 = fy dy - 


(c) The distribution of Z = Y,,/0 has p.d-f. 


ne"! for0< 2< 1, 


g(2) = 0 (2610) = : 


otherwise, 


where f(-|0) comes from part (a). One can see that g(z) does not depend on @, hence the distri- 
bution of Z is the same for all 0. 
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(d) We would like to find two random variables A(Y,,) and B(Y,,) such that 
PrAY,) = @ = BY,)) =, forall @. (S.8.9) 


This can be arranged by using the fact that Y,,/0 has the c.d.f. G(x) = x” for all 6. This means 
that 


Pr(a< 2 <b) =o" a", 


for all 6. Let a and b be constants such that 6” — a” = y (eg., b = ({1 + y]/2)!/” and a = 
({1 — 7]/2)'/"). Then set A(Y,) = Y,/b and B(Y;,) = Y,/a. It follows that (S.8.9) holds. 
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Testing Hypotheses 


9.1 Problems of Testing Hypotheses 


Commentary 


This section was augemented in the fourth edition. It now includes a general introduction to likelihood ratio 
tests and some foundational discussion of the terminology of hypothesis testing. After covering this section, 
one could skip directly to Sec. 9.5 and discuss the t test without using any of the material in Sec. 9.2—9.4. 
Indeed, unless your course is a rigorous mathematical statistics course, it might be highly advisable to skip 
ahead. 


Solutions to Exercises 


1. (a) Let 6 be the test that rejects Hp when X > 1.The power function of 6 is 
m(8|6) = Pr(X > 1|6) = exp(—£), 
for 6 > 0. 
(b) The size of the test 6 is supgs,7(6|6). Using the answer to part (a), we see that 7(6|d) is a 


decreasing function of 6, hence the size of the test is 7(1|6) = exp(—1). 


2. (a) We know that if 0<y <0, then Pr(Y, < y) = (y/@)”. Also, if y > 0, then Pr(Y, 
Therefore, if @ < 1.5, then 7(@) = Pr(Y, < 1.5) = 1. If 0 > 1.5, then 7(0) = Pr(Y, 
(1.5/6)". 


(b) The size of the test is 


(6) (=) (=) (3) 

a =sup7(#) =sup{(—} ={—] =[-] . 

ee Ae 60 2 4 

3. (a) For any given value of p,a(p) = Pr(Y > 7)+Pr(Y < 1), where Y has a binomial distribution with 


parameters n = 20 and p. For p=0,Pr(Y > 7) =0 and Pr(Y < 1) = 1. Therefore, 7(0) = 1. For 
p = 0.1, it is found from the table of the binomial distribution that 


<y)a1 
< 1.5) = 


Pr(Y > 7) = .0020 + .0003 + .0001 + .0000 = .0024 


and Pr(Y < 1) = .1216+.2701 = .3917. Hence, 7(0.1) = 0.3941. Similarly, for p = 0.2, it is found 
that 


Pr(Y > 7) = .0545 + .0222 + .0074 + .0020 + .0005 + .0001 = .0867 
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and Pr(Y < 1) = .0115 + .0576 = .0691. Hence, 7(0.2) = 0.1558. By continuing to use the tables 
in this way, we can find the values of 7(0.3), 7(0.4), and 7(0.5). For p = 0.6, we must use the 
fact that if Y has a binomial distribution with parameters 20 and 0.6, then Z = 20—Y has a 
binomial distribution with parameters 20 and 0.4. Also, Pr(Y > 7) = Pr(Z < 13) and Pr(Y < 
1) = Pr(Z > 19). It is found from the tables that Pr(Z < 13) = .9935 and Pr(Z > 19) = .0000. 
Hence, 7(0.6) = .9935. Similarly, if p = 0.7, then Z = 20—Y will have a binomial distribution with 
parameters 20 and 0.3. In this case it is found that Pr(Z < 13) = .9998 and Pr(Z > 19) = .0000. 
Hence, 7(0.7) = 0.9998. By continuing in this way, the values of 7(0.8), 7(0.9), and 7(1.0) = 1 
can be obtained. 


(b) Since Hp is a simple hypothesis, the size a of the test is just the value of the power function at 
the point specified by Hp. Thus, a = 7(0.2) = 0.1558. 


. The null hypothesis Ho is simple. Therefore, the size a of the test is a = Pr(Rejecting Ho | 4 = po). 


When ps = pio, the random variable Z = n‘/?(X,, — yo) will have the standard normal distribution. 
Hence, since n = 25, 


a = Pr(|Xn — po] > c) = Pr(|Z| > 5c) = 2[1 — &(5c)). 


Thus, a = 0.05 if and only if ®(5c) = 0.975. It is found from a table of the standard normal distribution 
that 5c = 1.96 and c = 0.392. 


. A hypothesis is simple if and only if it specifies a single value of both and o. Therefore, only the 


hypothesis in (a) is simple. All the others are composite. In particular, although the hypothesis in (d) 
specifies the value of yw, it leaves the value of o arbitrary. 


. If Ho is true, then X will surely be smaller than 3.5. If Hy is true, then X will surely be greater than 


3.5. Therefore, the test procedure which rejects Ho if and only if X > 3.5 will have probability 0 of 
leading to a wrong decision, no matter what the true value of @ is. 


. Let C be the critical region of Y;,, values for the test 6, and let C* be the critical region for 6*. It is 


easy to see that C* C C. Hence 
(9/6) — 7(0|5*) =Pr(Y¥n Cn (c*)| 0). 
Here CN (C*)© = [4, 4.5], so 
(6|5) — 7(0|5*) = Pr(4 < Y, < 4.5]0). (8.9.1) 


(a) For 0 < 4 Pr(4 < Y,,|@) = 0, so the two power functions must be equal by (S.9.1). 
(b) For 6 > 4, 


(min{0,4.5})" — 4” 


> 0. 
gn 


Pr(4 < Yn < 4.5/6) = 


Hence, 1(6|d) > 7(6|6*) by (S.9.1). 


(c) The only places where the power functions differ are for 9 > 4. Since these values are all in 0), 
it is better for a test to have higher power function for these values. Since 6 has higher power 
function than 6* for all of these values, 6 is the better test. 
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8. (a) The distribution of Z given py is the normal distribution with mean n!/?(y— ug) and variance 1. 
We can write 


Pr(Z > clu) =1- G([e— n/?(u — pp)]) = O(n? — 249 — 0). 
Since ® is an increasing function and n!/?2 — n4/2;19 — c is an increasing function of 4, the power 
function is an increasing function of ju. 


(b) The size of the test will be the power function at js = jug, since jig is the largest value in Qo and 
the power function is increasing. Hence, the size is ®(—c). If we set this equal to ag, we can solve 
for c= —@~ (ag). 


9. A sensible test would be to reject Ho if X, < c’. So, let T = sy — Xn. Then the power function of the 
test 6 that rejects Ho when T > c is 
mud) = Pr(T > ely) 
= Pr(Xn < Ho — ex) 
= (Vn —c—p)). 


Since ® is an increasing function and /n[u9 — c— py] is a decreasing function of pu, it follows that 
®(./n|49 — c — p]) is a decreasing function of p. 


10. When Z = z is observed, the p-value is Pr(Z > z|p19) = ®(n!/? [ug — 2). 


11. (a) For cy > 2, Pr(Y < c1|p = 0.4) > 0.23, hence c; < 1. Also, for cp < 5, Pr(Y > cg|p = 0.4) > 0.26, 
hence cz > 6. Here are some values of the desired probability for various (c;,c2) pairs 


c 6c2 | Pr(Y < alp =0.4) + Pr(Y > colp = 0.4) 


1 6 0.1699 
1 7 0.0956 
0 6 0.1094 
—-l 6 0.0994 


So, the closest we can get to 0.1 without going over is 0.0994, which is achieved when c, < 0 and 
C= 6. 
(b) The size of the test is 0.0994, as we calculated in part (a). 


(c) The power function is plotted in Fig. $.9.1. Notice that the power function is too low for values of 
p <0.4. This is due to the fact that the test only rejects Hp when Y > 6 A better test might be 
one with c; = 1 and cz = 7. Even though the size is slightly smaller (as is the power for p > 0.4), 
its power is much greater for p < 0.4. 
12. (a) The power function of 6, is 
2 dx 1 fa 
6l5.) = Px(X > cl) = f° —— = =| 2 aret —@)|. 
1(O|dc) r(X > cl@) . @ieGooy) «le arctan(c — @) 


Since arctan is an increasing function and c — @ is a decreasing function of 0, the power function 
is increasing in 0. 


(b) To make the size of the test 0.05, we need to solve 
1 
005 = . E — arctan(c — 60) ; 
for c. We get 


c= 69 + tan(0.457) = 0) + 6.314. 
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Power 


Figure $.9.1: Power function of test in Exercise 11c of Sec. 9.1. 


(c) The p-value when X = z is observed is 
1 
Pr(X > z|0 = 0) = — - — arctan(% — @9)| . 
7 


For c = 3, Pr(X > c|O = 1) = 0.0803, while for c = 2, the probability is 0.2642. Hence, we must use 
c= 3. 


(a) The distribution of X is a gamma distribution with parameters n and @ and Y = X@ has a 
gamma distribution with parameters n and 1. Let G,, be the c.d.f. of the gamma distribution with 
parameters n and 1. The power function of 6, is then 


a(0|6,.) = Prix > lt) = Pr ¥ > cell) =1=G, (co), 


Since 1 — G,, is an decreasing function and c@ is an increasing function of 0, 1 — G,(c@) is a 
decreasing function of 6. 


(b) We need 1 — G,,(c09) = ag. This means that c = G7 !(1 — ag)/O0. 


(c) With ao = 0.1, n = 1 and 69 = 2, we find that G,(y) = 1 — exp(—y) and G7 !(p) = —log(1 — p). 
So, c = — log(0.1)/2 = 1.151. The power function is plotted in Fig. $.9.2. 


The p-value when X = zx is observed is the size of the test that rejects Hyp when X > x, namely 


0 ited, 
Pr(X > al =1) ={ l—x if0<2<1. 


n 
The confidence interval is (s?/c2,s2/c,), where s? = wee —,)* and c1,c2 are the (1 — y)/2 and 
i=1 
(1 + y)/2 quantiles of the x? distribution with n — 1 degrees of freedom. We create the test 5. of 
Ho : 0? =c by rejecting Hp if c is not in the interval. Let T(a) = s% and notice that c is outside of the 
interval if and only if T(a) is not in the interval (cic, c2c). 


We need q(y) to have the property that Pr(q(Y) < p\p) > y for all p. We shall prove that q(y) 
equal to the smallest po such that Pr(Y > y|p = po) > 1 —7 satisfies this property. For each p, let 
Ap = {y: q(y) < p}. We need to show that Pr(Y € A,|p) > y. First, notice that q(y) is an increasing 
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Power 


Figure $.9.2: Power function of test in Exercise 14c of Sec. 9.1. 


function of y. This means that for each p there is yp, such that A, = {0,..., yp}. So, we need to show 
that Pr(Y < yp|p) > ¥ for all p. Equivalently, we need to show that Pr(Y > y,|p) < 1—+. Notice that 
Yp is the largest value of y such that q(y) < p. That is, yp is the largest value of y such that there exists 
po <p with Pr(Y > y|po) > 1—v+. For each y, Pr(Y > y|p) is a continuous nondecreasing function of 
p. If Pr(Y > yp|p) > 1— 7, then there exists po < p such that 


1—y< Pr(Y > yp|po) = Pr(Y > yp + 1|po). 


This contradicts the fact that yp is the largest y such that there is po < p with Pr(Y > y|po) > 1-7. 
Hence Pr(Y > y,|p) < 1— 7 and the proof is complete. 


Our tests are all of the form “Reject Ho if T > c.” Let 6, be this test, and define 


a(c) = sup Pr(T > c|6), 
8ENo 


the size of the test 6.. Then 6, has level of significance ag if and only if a(c) < ag. Notice that a(c) is 
a decreasing function of c. When T = t is observed, we reject Ho at level of significance ag using 6, if 
and only if t > c, which is equivalent to a(t) < ag. Hence a(t) is the smallest level of significance at 
which we can reject Ho if T = t is observed. Notice that a(t) is the expression in Eq. (9.1.12). 


We want our test to reject Ho if X, < Y, where Y might be a random variable. We can write this as 
not rejecting Ho if X, > Y. We want X,, > Y to be equivalent to wo being inside of our interval. We 
need the test to have level ap, so 


Pr( X= ¥ lp = mayo") = (S.9.2) 
is necessary. We know that n!/?(X, — p9)/o’ has the t distribution with n — 1 degrees of freedom if 
[i = Lo, hence Eq. (S.9.2) will hold if Y = po — mee ere aa! hr —ag). Now, X, > Y if and only if 
m= Xa o daar ol — ag). This is equivalent to 1g in our interval if our interval is 


(00, Xn Ma ge a0)) ; 
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20. Let @) € Q, and let go = g(%). By construction, g(99) € w(X) if and only if d,, does not reject 
Hog : 9(8) < go. Given 6 = 6, the probability that 6,, does not reject Hog, is at least y because the 
null hypothesis is true and the level of the test is ag = 1 — y. Hence, (9.1.15) holds. 


21. Let U =n (Xn — alle. 
(a) We reject the null hypothesis in (9.1.22) if and only if 
U>T,',(1— ap). (S.9.3) 
We reject the null hypothesis in (9.1.27) if and only if 
UT < =F *\(1 = ap): (S.9.4) 
With ap < 0.5, T7',(1 — a9) > 0. So, (S.9.3) requires U > 0 while (S.9.4) requires U < 0. These 


cannot both occur. 
(b) Both (8.9.3) and (8.9.4) fail if and only if U is strictly between =7- (1 =9) and Ta). 
This can happen if X,, is sufficiently close to pig. This has probability 1 — 2a9 > 0. 


(c) If ag > 0.5, then T1,(1 — ag) < 0, and both null hypotheses would be rejected if U is between 
the numbers T71, (1 — ag) < 0 and —T', (1 — ag) > 0. This has probability 2a9 — 1 > 0. 


9.2 Testing Simple Hypotheses 


Commentary 


This section, and the two following, contain some traditional optimality results concerning tests of hypotheses 
about one-dimensional parameters. In this section, we present the Neyman-Pearson lemma which gives 
optimal tests for simple null hypotheses against simple alternative hypotheses. It is recommended that one 
skip this section, and the two that follow, unless one is teaching a rigorous mathematical statistics course. 
This section ends with a brief discussion of randomized tests. Randomized tests are mainly of theoretical 
interest. They only show up in one additional place in the text, namely the proof of Theorem 9.3.1. 


Solutions to Exercises 


1. According to Theorem 9.2.1, we should reject Ho if fi(x) > fo(x), not reject Ho if fi(x) < fo(x) and 
do whatever we wish if f(x) = fo(x). Here 


0.3 if#=1, 
fo(z) = 0.7 ifx=0, 
0.6 ifv=1, 
fil) = 0.4 if2=0. 


We have fi(x) > fo(x) if « = 1 and f(x) < fo(x) if e =0. We never have fi(x) = fo(x). So, the test 
is to reject Ho if X = 1 and not reject Ho if X = 0. 


2. (a) Theorem 9.2.1 can be applied with a = 1 and b = 2. Therefore, Hp should not be rejected if 
fi(x)/fo(x) < 1/2. Since fi(x)/fo(x) = 2x, the procedure is to not reject Hp if x < 1/4 and to 
reject Ho if x > 1/4. 
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(b) For this procedure, 
3 


sl 
a(6) = Pr(Rej. Ho| fo) = f 1, fol) de = 5 


and 
1/4 it 
B(6) = Pr(Acc. Ho| fi) =a 2edx = —. 
0 16 
Therefore, a(d) + 26(6) = 7/8. 


(a) Theorem 9.2.1 can be applied with a = 3 and b = 1. Therefore, Ho should not be rejected if 
fi(x)/fo(x) = 2x < 3. Since all possible values of X lie in the interval (0,1), and since 2x < 3 for 
all values in this interval, the optimal procedure is to not reject Hg for every possible observed 
value. 


(b) Since Ho is never rejecte, a(6) = 0 and 6(6) = 1. Therefore, 3a(d) + 8(6) = 1. 


(a) By the Neyman-Pearson lemma, Ho should be rejected if fi(x)/fo(x) = 2” > k, where k is chosen 
so that Pr(2z > k| fo) = 0.1. For 0 < k < 2, 


fo) = 1-5. 


k 
Pr(2X > k| fo) =Pr(X > 5 ; 


If this value is to be equal to 0.1, then k = 1.8. Therefore, the optimal procedure is to reject. Ho 
if 2a > 1.8 or, equivalently, if « > 0.9. 


(b) For this procedure, 
0.9 
B(6) = Pr(Ace. Ho| fi) = | fi(x)dx = 0.81. 
0 


(a) The conditions here are different from those of the Neyman-Pearson lemma. Rather than fixing 
the value of a(d) and minimizing 6(6), we must here fix the value of {(6) and minimize a(0). 
Nevertheless, the same proof as that given for the Neyman-Pearson lemma shows that the optimal 
procedure is again to reject Ho if f1(X)/fo(X) > k, where k is now chosen so that 


_ _ fi(X) 7 
(6) = Pri Ace. Hg| Hy) = Pr Fes = k| | = 0,05. 
In this exercise, 
folX) = <p exp|-2 (ei — 3.5) 
0 = (Qn)n/2 exp 5) a XG 
and 
A(x)=—4 1S (a; — 5.0)? 
1 = (Qn)? exp 5) — Xi 
Therefore, 
fi(X) 1\< 2 2 
lo =. x; —3.5)* — x; — 5.0 
8 F(X) 5 2 ) 2 ) 
1 n nm n n 
= 5 pe: ~7)° aj +12.25n — So x3 +10 5° a; — 25n 
i=1 i=l i=l i=l 
= 52Fn — (cons ae 
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It follows that the likelihood ratio f;(X)/fo(X) will be greater than some specified constant k 
if and only if Z, is greater than some other constant k’. Therefore, the optimal procedure is to 
reject Ho if Z, > k’, where k’ is chosen so that 

Pr(X,, < k’ | Hy) = 0.05. 


We shall now determine the value of k’. If Hy is true, then X,, will have a normal distribution 


with mean 5.0 and variance 1/n. Therefore, Z = \/n(Xy — 5.0) will have the standard normal 
distribution, and it follows that 


Pr(X,, < k’| Hy) = Pr[Z < J/n(k’ — 5.0)] = ®[/n(k! — 5.0)). 
If this probability is to be equal to 0.05, then it can be found from a table of values of ® that 
/n(k! — 5.0) = —1.645. Hence, k’ = 5.0 — 1.645n-/?. 
(b) For n = 4, the test procedure is to reject Ho if X, > 5.0 — 1.645/2 = 4.1775. Therefore, 
a(6) = Pr(Rej. Ho | Ho) = Pr(X, > 4.1775 | Ho). 


When Hg is true, X;,has a normal distribution with mean 3.5 and variance 1/n = 1/4. Therefore, 


Z = 2(X,, — 3.5) will have the standard normal distribution, and 
a(6) = Pr[Z > 2(4.1775 — 3.5)] = Pr(Z > 1.355) 
1 — (1.355) = 0.0877. 
6. Theorem 9.2.1 can be applied with a = b = 1. Therefore, Hp should be rejected if f;(X)/fo(X) > 1. 


If we let y = > x;, then 
i=1 


fi(X) = pid — pi)” 


and 

fo(X) = p§(1 — po)” ". 
Hence, 

A(X) _ Es = Poh" € =)" 

fo(X) — lpol(t—-pi)J \1—po 
But fi(X)/fo(X) > 1 if and only if log|fi(X)/fo(X)] > 0, and this inequality will be satisfied if and 
only if 

1- 1- 
ee Ee 2e)) + nog ( 21) =, 
po(1 — p1) 1— po 


Since p, < po and 1— po < 1—py, the first logarithm on the left side of this relation is negative. Finally, 
if we let Z,, = y/n, then this relation can be rewritten as follows: 


om ST, poe = PD] ES) 


The optimal procedure is to reject Ho when this inequality is satisfied. 
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By the Neyman-Pearson lemma, Hp should be rejected if f;(X)/fo(X) > k. Here, 


il Tat 
fo(X) = (anyr/2gn2 &*P 5 2 (2 ms “ 


and 
Qo PS geal? 
Fit ) ms (2)n/23n/2 exp 8 d(H a LL) * 
Therefore, 
fi(X) Ll — 
log = — pane + (const.). 
are @) ~ 12 


n 

It follows that the likelihood ratio will be greater than a specified constant k if and only if De: — 
i=1 

yu) is greater than some other constant c. The constant c is to be chosen so that 


Pr Sox =a AG 


i=1 


i] = 0.05. 


n 

The value of c can be determined as follows. When Ho is true, W = Se — 1)? /2 will have ? 
i=1 

distribution with n degrees of freedom. Therefore, 


Pr Sox —p)?>e 


i=1 


i] =Pr(W> a 


If this probability is to be equal to 0.05, then the value of c/2 can be determined from a table of 
the y? distribution. 

For n = 8, it is found from a table of the y? distribution with 8 degrees of freedom that c/2 = 15.51 
and c = 31.02. 


The p.d.f.’s fo(a) and f(x) are as sketched in Fig. $.9.3. Under Hp it is impossible to obtain a 
value of X greater than 1, but such values are possible under H;. Therefore, if a test procedure 
rejects Ho only if x > 1, then it is impossible to make an error of type 1, and a(d) = 0. Also, 


B(6) = Pr(X <1| fh) =5 


A fo(x) 


1/2 


0 1 2 xX 


Figure §$.9.3: Figure for Exercise 8a of Sec. 9.2. 
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(b) To have a(6) = 0, we can include in the critical region only a set of points having probability 0 
under Hp. Therefore, only points x > 1 can be considered. To minimize {(0) we should choose 
this set to have maximum probability under H;. Therefore, all points x > 1 should be used in the 
critical region. 


9. As in Exercise 8, we should reject Ho if at least one of the n observations is greater than 1. For this 
test, a(d) = 0 and 


(5) = Pr(Ace. Ho | Hi) = Pr(X; <1,...,Xn <1| Hy) = (5) . 


10. (a) and (b). Theorem 9.2.1 can be applied with a = b = 1. The optimal procedure is to reject Ho if 
n 
fi(X)/fo(X) > 1. If we let y = Sa then for 7 = 0,1, 


i=l 
(X) = exp(=nAi)Ay 
cae) 
Therefore, 
fi(X) _ Ai 
og fo(X) > y log (=) — n(A1 — Ao). 


Since A; > Ao, it follows that f;(X)/fo(X) > 1ifand only if%, = y/n > (Ai—Ao)/(log A1 —log Ao). 


(c) If H; is true, then Y will have a Poison distribution with mean nd;. For \9 = 1/4, A1 = 1/2, and 
n = 20, 


n(Ar — Ao). 20(0.25) 
log Ay —log Ag ~—-: 0.69314 
Therefore, it is found from a table of the Poison distribution with mean 20(1/4) = 5 that 
a(5) = Pr(Y > 7.214| Ho) = Pr(Y > 8| Ho) = 0.1333. 
Also, it is found from a table with mean 20(1/2) = 10 that 
(6) = Pr(Y < 7.214| Hy) = Pr(Y <7| Hy) = 0.2203. 
Therefore, a(d) + (6) = 0.3536. 


= 7.214. 


11. Theorem 9.2.1 can be applied with a = b = 1. The optimal procedure is to reject Ho if fi(X)/fo(X) > 
1. Here, 


and 


After some algebraic reduction, it can be shown that f;(X)/fo(X) > 1 if and only if 7, > 0. If 
Hp is true, X,, will have the normal distribution with mean —1 and variance 4/n. Therefore, Z = 
V/n(Xpn +1)/2 will have the standard normal distribution, and 


a(6) = Px(Xp > 0| Ho) = Pr(Z > 5vn) =1-6(Sva). 
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Similarly, if H, is true, X,,will have the normal distribution with mean 1 and variance 4/n. Therefore, 
Z! = \/n(Xp — 1)/2 will have the standard normal distribution, and 


B(6) = Pr(Xy <0| Mi) = Pr (2 2 -5Vi) =1-6 (Sva) 


Hence, a(d) + 8(d) = 2[1 — &(,/n/2)]. We can now use a program that computes ® to obtain the 
following results: 


(a) If n = 1, a(6) + B(5) = 2(0.3085) = 0.6170. 
(b) Ifn = 4, a(6) + 8(6) = 2(0.1587) = 0.3173. 
(c) If n = 16, a(5) + 8(6) = 2(0.0228) = 0.0455. 
(d) If n = 36, a(5) + 8(6) = 2(0.0013) = 0.0027. 


Slight discrepancies appear above due to rounding after multiplying by 2 rather than before. 


In the notation of this section, f;(a) = 0? exp (-6; ae x) for i = 0,1. The desired test has the 
following form: reject Ho if fi(a)/fo(a) > & where k is chosen so that the probability of rejecting Ho 
is Qo given 0 = 09. The ratio of f; to fo is 


Since 09 < 61, the above ratio will be greater than k if and only if >>, 2; is less than some other 
constant, c. That c is chosen so that Pr (377, X; < c|@ = 09) = ao. The distribution of 7, X; given 
0? = 09 is the gamma distribution with parameters n and 69. Hence, c must be the ap quantile of that 
distribution. 


(a) The test rejects Ho if fo(X) < f:2(X). In this case, fo(a) = exp(—[x1 + r2]/2)/4, and fi (x) = 
4/(2+21 +22)? for both x; > 0 and x2 > 0. Let T = X; + Xo. Then we reject Ho if 
exp(—T/2)/4 < 4/(2+T)’. (S.9.5) 
(b) If X; = 4 and X» =3 are observed, then T = 7. The inequality in (S.9.5) is exp(—7/2)/4 < 4/9° 
or 0.007549 < 0.00549, which is false, so we do not reject Ho. 
(c) If Ho is true, then T is the sum of two independent exponential random variables with parameter 
1/2. Hence, it has the gamma distribution with parameters 2 and 1/2 by Theorem 5.7.7. 
(d) The test is to reject Ho if f;(X)/fo(X) > c, where c is chosen so that the probability is 0.1 that 
we reject Ho given 0 = 09. We can write 
fi(X) _ 16 exp(T'/2) 
fo(X) (2+T)° 
The function on the right side of (S.9.6) takes the value 2 at T = 0, decreases to the value 
0.5473 at T = 4, and increases for JT’ > 4. Let G be the c.d.f. of the gamma distribution with 
parameters 2 and 1/2 (also the x? distribution with 4 degrees of freedom). The level 0.01 test 
will reject Ho if T < cy or T > cp where c, and cy satisfy G(c,) + 1 — G(ce) = 0.01, and either 
16 exp(c,/2)/(2+c¢1)? = 16 exp(c2/2)/(2+c2)? or cy = 0 and 16 exp(c2/2)/(2+c2)° > 2. It follows 
that 1 — G(c2) < 0.01, that is, co > G~1(0.99) = 13.28. But 
16 exp(13.28)/(2 + 13.28)? = 3.4 > 2. 


It follows that c; = 0 and the test is to reject Ho if T > 13.28. 
(e) If X; =4 and X2 =3, then T =7 and we do not reject Ho. 


(5.9.6) 
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9.3. Uniformly Most Powerful Tests 


Commentary 


This section introduces the concept of monotone likelihood ratio, which is used to provide conditions under 
which uniformly most powerful tests exist for one-sided hypotheses. One may safely skip this section if one 
is not teaching a rigorous mathematical statistics course. One step in the proof of Theorem 9.3.1 relies on 
randomized tests (Sec. 9.2), which the instructor might have skipped earlier. 


Solutions to Exercises 


nm 

1. Let y= ba Then the joint p.f. is 
i=1 

exp(—n2)dA¥ 


YS Teel) 


Therefore, for 0 < Ay < Xo, 


fn(X | 2) 


F(X) = exp(—n(A2 — A1)) (=)". 


which is an increasing function of y. 


n 
2. Let y= bees —.)*. Then the joint p.d-f. is 
i=l 


2) _ 1 a 
fr(X |o7) = (anyFor OP 953) 


Therefore, for 0 < a? < 03, 


fa(X |o3) of ee 1) sh 
oom XP I rl a ; 
fr(X loz) of  \2\o? of)” 


which is an increasing function of y. 


n n 
3. Let y = II x; and let z= bee Then the joint p.d-f. is 
i=1 i=1 


pre 


@—1 exp (—62z). 
Tray! p(—8z) 


fn(X |a) = 


Therefore, for 0 < ay < ag, 


fn(X | 02) 
fn(X | 01) 


AQ— 21 
’ 


= (const.)y 


which is an increasing function of y. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 9.3. Uniformly Most Powerful Tests 285 


4. The joint p.d.f. f,(X |) in this exercise is the same as the joint p.d.f. f,(X |a) given in Exercise 3, 
except that the value of 6 is now unknown and the value of a@ is known. Since z = n¥p, it follows that 
for 0 < 61 < fa, 


Jn(X|P2) = (const.) ex —_ nx 
(X18) ~ | t.)exp([61 — B2)nFp). 


The expression on the right side of this relation is a decreasing function of %,, because 6, — Bo < 0. 
Therefore, this expression is an increasing function of —Zp. 


nm 
5. Let y = % d(x;). Then the joint p.d.f. or the joint p.f. is 
i=l 


fn(X | 6) = [a(9)]” TI 7) exp[c()y].- 
i=1 


Therefore, for 0, < 69, 


fn(X | 02) _ = 
fr(X 01) — La(1) 


Since c(02) — c(@,) > 0, this expression is an increasing function of y. 


] exp{le(6s) ~ e(@s)Ju}. 


6. Let 6; < 09. The range of possible values of r(X) = max{X1,...,Xy} is the interval [0,02] when 
comparing 9; and 62. The likelihood ratio for values of r(a) in this interval is 


“aa 
03 
oo «(if 6, < r(x) < Oo. 


i) < re) <8; 


This is monotone increasing, even though it takes only two values. It does take the larger value oo 
when r(a) is large and it takes the smaller value 07/03 when r(a) is small. 


7. No matter what the true value of @ is, the probability that Ho will be rejected is 0.05. Therefore, the 
value of the power function at every value of 6 is 0.05. 


8. We know from Exercise 2 that the joint p.d.f. of X1,...,X, has a monotone likelihood ratio in the 
statistic 1X2. Therefore, by Theorem 9.3.1, a test which rejects Hp when Pee. & > cwill bea 
UMP eae To achieve a specified level of significance ag, the constant c Sea be chosen so that 
Pr (>: X? > c/o" =2.) =] op. Since SS X? has a continuous distribution and not a discrete distribu- 
ida, tucks will be a value of c which ee this equation for any specified value of ag (0 < ao < 1). 

9. The first part of this exercise was answered in Exercise 8. When n = 10 and o? = 2, the distribution 
of Y = > X?/2 will be the x? distribution with 10 degrees of freedom, and it is found from a table of 
this distribution that Pr(Y > 18.31) = 0.05. Also, 

eo 2 iS 
p(x 2 eo = 2 = PAY 2 <) 


Therefore, if this probability is to be equal to 0.05, then c/2 = 18.31 or c = 36.62. 
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10. 


11. 


12. 


13. 
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n 
Let Y = 2 X;. As in Example 9.3.7, a test which specifies rejecting Ho if Y > cis a UMP test. When 


i=l 
n = 20 and p = 1/2, it is found from the table of the binomial distribution given at the end of the book 
that 


Pr(Y > 14) = .0370 + .0148 + .0046 + .0011 + .0002 = .0577. 


Therefore, the level of significance of the UMP test which rejects Hp when Y > 14 will be 0.0577. 
Similarly, the UMP test that rejects Hp when Y > 15 has level of significance 0.0207. 


It is known from Exercise 1 that the joint p.f. of X1,...,X» has a monotone likelihood ratio in he 
n 
statistic Y = » X;. Therefore, by Theorem 9.3.1, a test which rejects Hp when Y > c will be a UMP 


i=1 
test. When A = 1 and n = 10, Y will have a Poisson distribution with mean 10, and it is found from 
the table of the Poisson distribution given at the end of this book that 


Pr(Y > 18) = .0071 + .0037 + .0019 + .0009 + .0004 + .0002 + .0001 = .0143. 


Therefore, the level of significance of the UMP test which rejects Hp when Y > 18 will be 0.0143. 


Change the parameter from 0 to ¢ = —@. In terms of the new parameter ¢, the hypotheses to be tested 
are: 


Hy : ¢< —6o, 
My: €>-6o. 


Let gn(X |¢) = fn(X | —¢) denote the joint p.d.f. or the joint p.f. of X1,...,X, when ¢ is regarded 
as the parameter. If ¢; < ¢2, then 6; = —¢; > —C2 = 09. Therefore, the ratio g,/(X | ¢2)/gn(X |G) 
will be a decreasing function of r(X). It follows that this ratio will be an increasing function of the 
statistic s(X) = —r(X). 

Thus, in terms of ¢, the hypotheses have the same form as the hypotheses (9.3.8) and gn(x |¢) has a 
monotone likelihood ratio in the statistic s(X). Therefore, by Theorem 9.3.1, a test which rejects Ho 
when s(X) > c’, for some constant c’, will be a UMP test. But s(X) > if and only if T= r(X) <c, 
where c = —c’. Therefore, the test which rejects Hg when T < c¢ will be a UMP test. If c is chosen to 
satisfy the relation given in the exercise, then it follows from Theorem 9.3.1 that level of significance 
of this test will be ag. 


(a) By Exercise 12, the test which rejects Hyp when X, < c will be a UMP test. For the level of 
significance to be 0.1, c should be chosen so that Pr(X, < c|u = 10) = 0.1. In this exercise, 
n = 4. When p = 10, the random variable z = 2(X,, — 10) has a standard normal distribution 
and Pr(X, < c|u = 10) = Pr[Z < 2(c — 10)}. It is found from a table of the standard normal 
distribution that Pr(Z < —1.282) = 0.1. Therefore, 2(¢ — 10) = —1.282 or c = 9.359. 


(b) When pz = 9, the random variable 2(X,, — 9) has the standard normal distribution. Therefore, the 
power of the test is 


Pr(Xn < 9.359| uw = 9) = Pr(Z < 0.718) = ©(0.718) = 0.7636, 


where we have interpolated in the table of the normal distribution between 0.71 and 0.72. 
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(c) When pw = 11, the random variable Z = 2(X,, — 11) has the standard normal distribution. There- 
fore, the probability of rejecting Hp is 


Pr(Xp > 93359 | = 11) = Pr(Z > —3.282) = Pr(Z < 3.282) = 0(3.282) = 0.9995. 


nm 

By Exercise 12, a test which rejects Hp when SX; < c will be a UMP test. When n = 10 and 
i=1 . 

A= 1,507, X; will have a Poisson distribution with mean 10 and ap = Pr (>: Ga e|A= :). From 


i=l 
a table of the Poisson distribution, the following values of ag are obtained. 


c = 0,a9 = .0000; 

c = 1l,ap = .0000 + .0005 = .0005; 

c = 2,a9 = .0000 + .0005 + .0023 = .0028; 

c = 3,a9 = .0000 + .0005 + .0023 + .0076 = .0104; 

c = 4,a9 = .0000 + .0005 + .0023 + .0076 + .0189 = .0293. 


For larger values of c,ag > 0.03. 


By Exercise 4, the joint p.d.f. of X1,...,Xn has a monotone likelihood ratio in the statistic —Xp. 
Therefore, by Exercise 12, a test which rejects Hy when —X,, < c’, for some constant c’, will be a UMP 
test. But this test is equivalent to a test which rejects Hp when X,, > c, where c = —c’. Since X,, has 
a continuous distribution, for any specified value of ag(0 < apo < 1) there exists a value of c such that 
Pr X, 2 el|@ =1/2) =a; 


We must find a constant c such that when n = 10,Pr(X, > c|@ = 1/2) = 0.05. When 8 = 1/2, 
each observation X; has an exponential distribution with 8 = 1/2, which is a gamma distribution 
n 


with parameters a = 1 and 8 = 1/2. Therefore, S > Xi has a gamma distribution with parameters 
i=l 
a=n=10and 8 = 1/2, which is a x? distribution with 2n = 20 degrees of freedom. But 


Pi(Xn > el =5) =Pr(Soxi> 10018 = 5). 


i=l 


It is found from a table of the x? distribution with 20 degrees of freedom that Pr(37"_, X; > 31.41) = 
0.05. Therefore, 10c = 31.41 and c = 3.141. 


In this exercise, Ho is a simple hypothesis. By the Neyman-Pearson lemma, the test which has maximum 
power at a particular alternative value 9; > 0 will reject Ho if f(a|@ = 61)/f(x|@ = 0) > c, where c is 
chosen so that the probability that this inequality will be satisfied when 0 = 0 is ag. Here, 


f(x|@ = 41) 
f(x|@ =0) 


>Cc 


if and only if (1 — c)4? + 2cO;x > cO? — (1 —c). For each value of 6, the value of c is to be chosen so 
that the set of points satisfying this inequality has probability ag when 0 = 0. For two different values 
of 0;, these two sets will be different. Therefore, different test procedures will maximize the power at 
the two different values of 6;. Hence, no single test is a UMP test. 
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18. The UMP test will reject Hp when X, > c, where Pr(X, > c|u =0) = Pr(/n Xn > Jne|p = 0) 
0.025. However, when pp = 0,,\/n X,, has the standard normal distribution. Therefore, Pr(./n Xn, 
1.96 | » = 0) = 0.025. It follows that \/nc = 1.96 and c= 1.96n-'/?. 


IV | 


(a) When pw = 0.5, the random variable Z = \/n(Xp, — 0.5) has the standard normal distribution. 
Therefore, 
7(0.5|6*) = Pr(Xp > 1.96n7/? | up = 0.5) = Pr(Z > 1.96 — 0.5n1/?) 
= Pr(Z <0.5n'/? — 1.96) = 6(0.5n1/? — 1.96). 
But (1.282) = 0.9. Therefore, 7(0.5|65*) > 0.9 if and only if 0.5n!/? — 1.96 > 1.282, or, equiv- 
alently, if and only if n > 42.042. Thus, a sample of size n = 43 is required in order to have 


m(0.5|6*) > 0.9. Since the power function is a strictly increasing function of ju, it will then also 
be true that 7(0.5| 6*) > 0.9 for u > 0.5. 


(b) When p, = —0.1, the random variable Z = \/n(X»p, + 0.1) has the standard normal distribution. 
Therefore, 
m(—0.1|6*) = Pr(Xp, > 1.96n~'/? | p = —0.1) = Pr(Z > 1.96 + 0.1n'/?) 
= 1-6(1.96 +0.1n"?). 
But (3.10) = 0.999. Therefore, (—0.1|6*) < 0.001 if and only if 1.96 + 0.1n'/? > 3.10 or, 
equivalently, if and only if n > 129.96. Thus, a sample of size n = 130 is required in order to have 


m(—0.1|6*) < 0.001. Since the power function is a strictly increasing function of ju, it will then 
also be true that (| 6*) < 0.001 for uw < —0.1. 


19. (a 


YS 


Let f(x\|) be the joint p.d.f. of X given yw. For each set A and i = 0,1, 


Pr(X €Alu=yi)= [yf Flelmae. (8.9.7) 


It is clear that f(a|uo) > 0 for all w and so is f(a|u1) > 0 for all a. Hence (S.9.7) is strictly 
positive for i = 0 if and only if it is strictly positive for i = 1. 


b) Both 6 and 6, are size ag tests of Hé : 4 = po versus Hi}: wu > po. Let 
0 1 


A = {a:0 rejects but 6; does not reject}, 
B = {2:6 does not reject but 6; rejects}, 
C = {z: both tests reject}. 


Because both tests have the same size, it must be the case that 

Pr(X € Alu = uo) + Pr(X € |“ = Yo) = ao = Pr(X € Blu = po) + Pr(X € | = po). 
Hence, 

Pr(X € Alu = po) = Pr(X € Blu = pio). (S.9.8) 


Because of the MLR and the form of the test 6,, we know that there is a constant c such that for 
every {££ > fo and every w € B and every y € A, 


flee) Jy fly) (S.9.9) 


f(|H0) f(yluo) 


Now, 


wuld) = fy [telat [Gf felwae. 
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Also, 


w(ulbi) = fp f felwde+ fy f Felwae 


It follows that, for 4 > Uo, 
|p [tear - fy f tewae 


(ul61) — 7/6) 
se teat slno)dar— | Ge covda Hate 


iy Eon 


where the inequality follows from (S.9.9), and the final equality follows from (S.9.8). 


9.4 Two-Sided Alternatives 


Commentary 


This section considers tests for simple (and interval) null hypotheses against two-sided alternative hypotheses. 
The concept of unbiased tests is introduced in a subsection at the end. Even students in a mathematical 
statistics course may have trouble with the concept of unbiased test. 


Solutions to Exercises 


1. If r(u| 6) is to be symmetric with respect to the point ~ = po, then the constants c; and cz must be 
chosen to be symmetric with respect to the value pig. Let c, = fo — & and cy = po +k. When p = Lo, 
the random variable Z = n!/2(X,, — jg) has the standard normal distribution. Therefore, 


m(uo|d) = Pr(Xn < wo —k| Mo) + Pr(Xn > Ho + | po) 
= Pr(Z < —n/?k) + Pr(Z > n/?k) 
2Pr(Z > n/?k) = 2[1 — (n/?k)). 


Since k must be chosen so that (jig |5) = 0.10, it follows that ®(n!/?k) = 0.95. Therefore, n1/?k = 1.645 
and k = 1.645n-1/2, 


2. When pi = uo, the random variable Z = n!/?(X;,— 19) has the standard normal distribution. Therefore, 


™(uo|d) = Pr(Xn < e1| Mo) + Pr(Xn = c2| Ho) 
Pr(Z < —1.96) + Pr[Z > n/?(c — po)] 
®(—1.96) + 1— ®[n/? (cy — W0)] 

= 1.025 — &[n!/? (cp — puo)]. 


If we are to have 1(ji9 | 5) = 0.10, then we must have ®[n'/?(cy—j19)] = 0.925. Therefore, n!/2(c2—ju9) = 
1.439 and cz = po + 1.439n71/2. 


3. From Exercise 1, we know that if c, = po — 1.645n71/? and cy = py + 1.645n—1/?, then (p19 |5) = 
0.10 and, by symmetry, 7(u“o + 1|6) = m(uo — 1|6). Also, when w = po + 1, the random variable 
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n'/?(Xp — flo — 1) has the standard normal distribution. Therefore, 


mo +115) = Pr(Xn <e1| pot 1) +Pr(Xn > c2| uo +1) 
= Pr(Z < —1.645 — n/?) + Pr(Z > 1.645 — n/?) 
&(—1.645 — n/?) + &(n1/? — 1.645). 


For n=9, = (tu + 1| 5) = &(—4.645) + G(1.355) < 0.95. 
For n=10, (fu +1|6) = &(—4.807) + (1.517) < 0.95. 
For n=11, = r(uo +: 1|5) = ©(—4.962) + O(1.672) > 0.95. 


4. If we choose cy and cg to be symmetric with respect to the value 0.15, then it will be true that 


m(0.1|6) = 2(0.2|6). Accordingly, let c, = 0.15 — k and cp = 0.15 + k. When uw = 0.1, the random 


variable Z = 5(X,, — 0.1) has a standard normal distribution. Therefore, 


n(0.1]6) = Pr(Xp <e|0.1) + Pr(Xpn > c|0.1) 
= Pr(Z < 0.25 —5k) + Pr(Z > 0.25 + 5k) 
= (0.25 — 5k) + ®(—0.25 — 5k). 


We must choose k so that 7(0.1|6) = 0.07. By trial and error, using the table of the standard normal 
distribution, we find that when 5k = 1.867, 


(0.1| 5) = ®(—1.617) + ®(—2.117) = 0.0529 + 0.0171 = 0.07. 


Hence, k = 0.3734. 


. As in Exercise 4, 


m(0.1|6) = Pr[Z < 5(c, — 0.1)] + Pr[Z > 5(ce — 0.1)| = ®(5c) — 0.5) + (0.5 — 5c). 
Similarly, 

n(0.2|6) = Pr[Z < 5(c, — 0.2)] + Pr[Z > 5(co — 0.2)| = (5c, — 1) + ®(1 — 5cy). 
Hence, the following two equations must be solved simultaneously: 


B(5c) — 0.5) + (0.5 —5en) = 0.02, 
®(5c1 _ 1) + O(1 _ 5c2) = 0.05. 


By trial and error, using the table of the standard normal distribution, it is found ultimately that if 
5c, = —2.12 and 5cg = 2.655, then 


®(5c; — 0.5) + (0.5 — 5c) = ®(—2.62) + ®(—2.155) = 0.0044 + 0.0155 = 0.02. 
and 
®(5c; — 0.1) + ®(1 — 5cg) = &(—3.12) + ®(—1.655) = 0.0009 + 0.0490 = 0.05. 
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6. Let T= aiax( X4,.. +. ;X,). Then 


1 

— for0<t<é 
psa. oe ee 

0 otherwise. 


Therefore, for 6, < 62, 


0 n 
(3) forO<t< 4, 
02 


io) for 0, <t < Oo. 


fn(X | 02) _ 
fn(X | 61) 


It can be seen from this relationship that f,(X |) has a monotone likelihood ratio in the statistic T 
(although we are being somewhat nonrigorous by treating oo as a number). 


For any constant c (0 < c < 3), Pr(T > c|@ = 3) = 1-(c/3)”. Therefore, to achieve a given level of 
significance ag, we should choose c = 3(1—ag)!/”. It follows from Theorem 9.3.1 that the corresponding 
test will be a UMP test. 


7. For @ > 0, the power function is 7(@|6) = Pr(T’ > c|@). Hence, 
0 for d<«¢, 
Aer 1- (5) for 6 >. 
6 
The plot is in Fig. $.9.4. 


7(0/8) 


Og ca ss eo ak 


Figure $.9.4: Figure for Exercise 7 of Sec. 9.4. 


8. (a) It follows from Exercise 8 of Sec. 9.3 that the specified test will be a UMP test. 
(b) For any given value of c (0 < c < 3), Pr(T < c|@ = 3) = (c/3)”". Therefore, to achieve a given 
level of significance ao, we should choose c = 3a Sy 


9. A sketch is given in Fig. S.9.5. 
10. (a) Let ap = 0.05 and let cy = 3a!” as in Exercise 8. Also, let cg = 3. Then 
(0|6) = Pr(T < 3a9/"| 0) + Pr(T > 3/6). 


Since Pr(T > 3|0) = 0 for 6 < 3, the function 7(@| 6) is as sketched in Exercise 10 for 9 < 3. For 
0 > 3, 


: c: f = (5) | > a. (S.9.10) 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


292 Chapter 9. Testing Hypotheses 


Op 


Figure $.9.5: Figure for Exercise 9 of Sec. 9.4. 


(b) In order for a test 6 to be UMP level ao, for (9.4.15), it necessary and sufficient that the following 
three things happen: 
e 6 has the same power function as the test in Exercise 6 for @ > 3. 
e 6 has the same power function as the test in Exercise 8 for 0 < 3. 
e 7(3|6) < ao. 
Because co = 3, we saw in part (a) that 7(0|0) is the same as the power function of the test in 


Exercise 8 for 9 < 3. We also saw in part (a) that 7(3|]6) = 0.05 = ap. For @ > 3, the power 
function of the test in Exercise 6 is 


_aq)in\” n 
Pr(T > 3(1 — a9) "/"|6) =1— (eae) = (5) Cay). 


It is straightforward to see that this is the same as (S.9.10). 


11. It can be verified that if cy and cg are chosen to be symmetric with respect to the value jug, then the 
power function 7(y| 6) will be symmetric with respect to the point 2 = wo and will attain its minimum 
value at 44 = wo. Therefore, if c, and cg are chosen as in Exercise 1, the required conditions will be 
satisfied. 


12. The power function of the test 6 described in this exercise is 


7(5|6) = 1 — exp(—cif) + exp(—c28). 


(a) In order for 6 to have level of significance ag, we must have 7(1|d) < ao. Indeed, the test will have 
size ag exactly if 


ag = 1 — exp(—c1) + exp(—cz). 


(b) We can let cy = — log(1 — ag/2) and cz = — log(ag/2) to solve this equation. 


13. The first term on the right of (9.4.13) is 
a ai aa ' exp(—t0)dt = F(a: 7,8). 


The second term on the right of (9.4.13) is the negative of 


anti 


n 
= : 1,0). 
7 Tin +d) G(a;n + 1,6) 


——t” exp(—t6)dt = 5 
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9.5 Thet Test 


Commentary 


This section provides a natural continuation to Sec. 9.1 in a modern statistics course. We introduce the t 
test and its power function, defined in terms of the noncentral t distribution. The theoretical derivation of 
the t test as a likelihood ratio test is isolated at the end of the section and could easily be skipped without 
interrupting the flow of material. Indeed, that derivation should only be of interest in a fairly mathematical 
statistics course. 

As with confidence intervals, computer software can replace tables for obtaining quantiles of the ¢ distri- 
butions that are used in tests. The R function qt can compute these. For computing p-values, one can use pt. 
The precise use of pt depends on whether the alternatoive hypothesis is one-sided or two-sided. For testing 
Ho : & < po versus Hy : 4 > po using the statistic U in Eq. (9.5.2), the p-value would be 1-pt(u,n-1), 
where u is the observed value of U. Fot the opposite one-sided hypotheses, the p-value would be pt (u,n-1). 
For testing Ho : uw = po versus Hy : w ¥ po, the p-value is 2*(1-pt(abs(u) ,n-1)). The power function of 
at test can be computed using the optional third parameter with pt, which is the noncentrality parameter 
(whose default value is 0). Similar considerations apply to the comparison of two means in Sec. 9.6. 


Solutions to Exercises 


1. We computed the summary statistics Z, = 1.379 and o/ = 0.3277 in Example 8.5.4. 


(a) The test statistic is U from (9.5.2) 
1.379 — 1.2 
=o? —____— 
0.3277 

We reject Ho at level ag = 0.05 if U > 1.833, the 0.95 quantile of the ¢ distribution with 9 degrees 

of freedom. Since 1.727 % 1.833, we do not reject Ho at level 0.05. 
(b) We need to compute the probability that a t random variable with 9 degrees of freedom exceeds 
1.727. This probability can be computed by most statistical software, and it equals 0.0591. With- 


out a computer, one could interpolate in the table of the ¢ distribution in the back of the book. 
That would yield 0.0618. 


U = 1.727. 


2. When pip = 20, the statistic U given by Eq. (9.5.2) has a ¢ distribution with 8 degrees of freedom. The 
value of U in this exercise is 2. 
(a) We would reject Ho if U > 1.860. Therefore, we reject Hp. 
(b) We would reject Ho if U < —2.306 or U > 2.306. Therefore, we don’t reject Ho. 
(c) We should include in the confidence interval, all values of ju for which the value of U given by Eq. 
(9.5.2) will lie between —2.306 and 2.306. These values form the interval 19.694 < ug < 24.306. 


3. It must be assumed that the miles per gallon obtained from the different tankfuls are independent and 
identically distributed, and that each has a normal distribution. When pp = 20, the statistic U given 
by Eq. (9.5.2) has the ¢ distribution with 8 degrees of freedom. Here, we are testing the following 
hypotheses: 


Hg: p = 20, 
Ay: < 20. 


We would reject Ho if U < —1.860. From the given value, it is found that X, = 19 and S? = 22. 
Hence, U = —1.809 and we do not reject Ho. 
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. When pp = 0, the statistic U given by Eq. (9.5.2) has the ¢ distribution with 7 degrees of freedom. 


Here 


—112 _ 
—— = 


. ee —1.4 


and 


S0(Xj — X;)? = 43.7 — 8(1.4)? = 28.02. 
i=1 


The value of U can now be found to be —1.979. We should reject Hp if U < —1.895 or U > 1.895. 
Therefore, we reject Ho. 


. It is found from the table of the ¢ distribution with 7 degrees of freedom that c; = —2.998 and cg lies 


between 1.415 and 1.895. Since U = —1.979, we do not reject Hp. 


. Let U be given by Eq. (9.5.2) and suppose that c is chosen so that the level of significance of the test 


is a9. Then 
r(1t,02|6) = Pr(U > e| 1,02), 


If we let Y = n'/?(X,,—p)/o and Z = 7" (X; — Xn)? /o?, then Y will have a standard normal distri- 
bution, Z will have a x? distribution with n — 1 degrees of freedom, and Y and Z will be independent. 
Also, 


Y+ni/2 alla os!) 
ee) - ) 
[Z/(n — 1]? 


It follows that all pairs (11,07) which yield the same value of (42 — g)/o will yield the same value of 
7 (u,07 | 6). 


. The random variable T = (X — y)/o will have the standard normal distribution, the random variable 


n 
Z= = Y;/o? will have a x? distribution with n degrees of freedom, and T and Z will be independent. 


i=l 
Therefore, when p = po, the following random variable U will have the t distribution with n degrees of 
freedom: 


Z ni/?(X — po) 


[Z/nJv2 pn te 
pag 


The hypothesis Ho would be rejected if U > c. 


. When o? = 03, $2/o2 has a x? distribution with n — 1 degrees of freedom. Choose c so that, when 


a” = 0%, Pr(82/o¢ > c) =p, and reject Ho if $7/o5 > c. Then a(f,07 |d)= ag if o? = 04. Ifa? 4 aG, 
then Z = S?/o? has the y? distribution with n — 1 degrees for freedom, and $2/02 = (07/o@)T. 
Therefore, 

m(u,0°|5) = Pr(S,/o9 2 | u,07) = Pr(T 2 cog/o”). 
If c/o? > 1, then m(p,07| 5) < Pr(T > c) = ap. If 02/07 < 1, then r(u,07|5) > Pr(T > c) =a0. 
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. When o? = 4, S?/4 has the y? distribution with 9 degrees of freedom. We would reject Ho if 92/4 > 


16.92. Since $2/4 = 60/4 = 15, we do not reject. Ho. 


When o? = 4, 92/4 has the x? distribution with 9 degrees of freedom. Therefore, Pr(S$2/4 < 2.700) = 
Pr(S2/4 > 19.02) = 0.025. It follows that c, = 4(2.700) = 10.80 and cy = 4(19.02) = 76.08. 


U, has the distribution of X/Y where X has a normal distribution with mean w and variance 1, and 
Y is independent of X such that mY? has the x? distribution with m degrees of freedom. Notice that 
—X has a normal distribution with mean —w and variance 1 and is independent of Y. So U2 has the 
distribution of —X/Y = —Uj. So 


Pr(U2 < —c) = Pr(—U, < —c) = Pr(U, > c). 
The statistic U has the ¢ distribution with 16 degrees of freedom. The calculated value is 


V17(Xn, — 3) 0.2 8 


~ (S2/16}1/2 ~~ (0.09/16)1/2 3 
and the corresponding tail area is Pr(U > 8/3). 


The test statistic is U = 169!/2(3.2 — 3)/(0.09)!/2 = 8.667. The p-value can be calculated using 
statistical software as 1 — Tig9 (8.667) = 1.776 x 107". 


The statistic U has the ¢ distribution with 16 degrees of freedom. The calculated value of U is 
0.1 4 


(0.09/16)'/2 3 


Because the alternative hypothesis is two-sided, the corresponding tail area is 


4 4 4 
Pri U > — Pri U < —-) =2PriU > — }. 
( >5)+ ( = 3) i =) 


The test statistic is U = 169!/2(3.2 — 3.1)/(0.09)!/? = 4.333. The p-value can be calculated using 
statistical software as 2[1 — Ti69(4.333)] = 2.512 x 10~°. 
The calculated value of U is 

—0.1 4 


(0.09/16)'/2 3" 


Since this value is the negative of the value found in Exercise 14, the corresponding tail area will be 
the same as in Exercise 14. 


The denominator of A(a) is still (9.5.11). The M.L.E. (fio, 62) is easier to calculate in this exercise, 
namely ji9 = [lo (the only possible value) and 
1 n 
a= SG — p19)”. 


oe 


These are the same values that lead to Eq. (9.5.12) in the text. Hence, A(a) has the value given in 
Eq. (9.5.14). For k < 1, A(a) < & if and only if 


JU] > (n— RP — I)? =. 
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18. In this case Q9 = {(u,07) : w > uo}, and A(x) = 1 if F, > po. If Fy < so, then the numerator of A(z) 


is (9.5.12), and the formula for A(x) is the same as (9.5.13) with the branch labels switched. This time, 
A(a) is a non-decreasing function of u, the observed value of U. So for k < 1, A(a) < k if and only if 
U <c, for the same c as in Example 9.5.12. 


9.6 Comparing the Means of Two Normal Distributions 


Commentary 


The two-sample ¢ test is introduced for the case of equal variances. There is some material near the end 
of the section about the case of unequal variances. This is useful material, but is not traditionally covered 
and can be skipped. Also, the derivation of the two-sample t test as a likelihood ratio test is provided for 
mathematical interest at the end of the section. 


Solutions to Exercises 


1. In this example, n = 5, m= 5, Xm = 18.18, Yn = 17.32, 9% = 12.61, and $2 = 11.01. Then 


(5 +5 — 2)1/2(18.18 — 17.32) 


y= 2 7 
($+) " (11.01 +1261) 


= 0.7913. 


We see that |U| = 0.7913 is much smaller than the 0.975 quantile of the ¢ distribution with 8 degrees 
of freedom. 


. In this exercise, m = 8, n = 6, Fm = 1.5125, J, = 1.6683, S% = 0.18075, and 9% = 0.16768. When 


[1 = 2, the statistic U defined by Eq. (9.6.3) will have the ¢ distribution with 12 degrees of freedom. 
The hypotheses are as follows: 


Ho: j4 > pa, 
Ay: py < pa. 


Since the inequalities are reversed from those in (9.6.1), the hypothesis Hp should be rejected if U < c. 
It is found from a table that c = —1.356. The calculated value of U is —1.692. Therefore, Ho is rejected. 


. The value c = 1.782 can be found from a table of the ¢ distribution with 12 degrees of freedom. Since 


U = —1.692, Ho is not rejected. 


. The random variable X, —Y has a normal distribution with mean 0 and variance (o?/m) + (ko?/n). 


Therefore, the following random variable has the standard normal distribution: 
Kyo Ya, 
1 k 1/2 . 
(— = =) O71 
mn 
The random variable $3./o? has a y? distribution with m-1 degrees of freedom. The random variable 


S?./(ko?) has a x? distribution with n — 1 degrees of freedom. These two random variables are inde- 


pendent. Therefore, Z2 = (1/07)($% + $?-/k) has a x? distribution with m+n — 2 degrees of freedom. 
Since Z, and Z are independent, it follows that U = (m+n-—2)!/?Z,/ Z,! * has the t distribution with 


m-+n-— 2 degrees of freedom. 
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. Again, Hp should be rejected if U < —1.356. Since U = —1.672, Hp is rejected. 


. If py — 2 = A, the following statistic U will have the t distribution with m+n — 2 degrees of freedom: 


(mtn— 2)? OL, — ¥en —)) 


1 1 1/2 
(11) sy 


The hypothesis Ho should be rejected if either U < cy or U > cg. 


. To test the hypotheses in Exercise 6, Hj would not be rejected if —1.782 < U < 1.782. The set of 


all values of A for which Hp would not be rejected will form a confidence interval for uw, — ~2 with 
confidence coefficient 0.90. The value of U, for an arbitrary value of A, is found to be 


V12(—0.1558 — 4) 
0.3188 


v= 


It is found that —1.782 < U < 1.782 if and only if —0.320 < » < 0.008. 


. The noncentrality parameter when |j41 — f2| = 0 is 


1 
) = ———____ = 2.108. 


1 1 1/2 
(5 . ov 
The degrees of freedom are 16. Figure 9.14 in the text makes it look like the power is about 0.23. Using 
computer software, we can compute the noncentral t probability to be 0.248. 


. The p-value can be computed as the size of the test that rejects Hp when |U| > |u|, where wu is the 


observed value of the test statistic. Since U has the ¢ distribution with m+n — 2 degrees of freedom 
when Ho is true, the size of the test that rejects Hp when |U| > |u| is the probability that a t random 
variable with m+n -— 2 degrees of freedom is either less than —|u| or greater than |u|. This probability 
is 


Timtn—2(—|ul) +1 — Tintn—2(|ul) = 2[1 — Tm4n—2(lul), 


by the symmetry of ¢ distributions. 


Let X; stand for an observation in the calcium supplement group and let Y; stand for and observation 
in the placebo group. The summary statistics are 


m = 10, 
7) = Lh, 
Em = 109.9, 
7, = 1139, 
s*? = 546.9, 
os = 12825, 


We would reject the null hypothesis if U > Tyo (0.9) = 1.328. The test statistic has the observed value 
u = —0.9350. Since u < 1.328, we do not reject the null hypothesis. 
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11. (a) The observed value of the test statistic U is 


4 — 2)1/2(8.560 — 5.551 
y= 0484 85 = 2)°77(8.560 — 5.551) gag 


( Fo ‘a (2745.7 + 783.9)1/? 
43 35 ; : 

We would reject the null hypothesis at level aj = 0.01 if U > 2.376, the 0.99 quantile of the t 
distribution with 76 degrees of freedom. Since u < 2.376, we do not reject Hp at level 0.01. (The 
answer in the back of the book is incorrect in early printings.) 


(b) For Welch’s test, the approximate degrees of freedom is 
2745.7 783.9 \? 
= (a5 30 a) _ 
“A (847? 1 BSB? 
me arg + 38 ea 
The corresponding t quantile is 2.381. The test statistic is 
8.560 — 5.551 
2745.7 .. 783.9 \ 1/4 
(z x 42 ss 35 X =i) 


Once again, we do not reject Hp. (The answer in the back of the book is incorrect in early 
printings.) book is incorrect.) 


70.04. 


= 2.038. 


12. The W in (9.6.15) is the sum of two independent random variables, one having a gamma distribution 
with parameters (m — 1)/2 and m(m — 1)/(207) and the other having a gamma distribution with 
parameters (n — 1)/2 and n(n — 1)/(203). So, the mean and variance of W are 

—1)/2 —1)/2 |, os 
E(W) = OO ial Cr Cle a 
m(m—1)/(20f)  n(n—1)/(205) mn 
(m—1)/2 (n —1)/2 26; 205 
Var(W) = ———— ss 
mW) = Gia Pa) ea ee oD 


The gamma distribution with parameters a and { has the above mean and variance if a/G = E(W) 
and a/6? = Var(W). In particular, a = E(W)?/ Var(W), so 


This is easily seen to be the same as the expression in (9.6.16). 
13. The likelihood ratio statistic for this case is 


SUP {(j11 12,02): Fa} (L,Y | Ha, M2, O) 


A(z, y) = 
SUP { (115 ,§02,02):p1=p2} G(B Y | M1 M2, 07) 


(S.9.11) 


Maximizing the numerator of (S.9.11) is identical to maximizing the numerator of (9.6.10) when Z,, < J, 
because we need j41 = fg in both cases. So the M.L.E.’s are 


r % MEm + NYn 


fy = fi2= 

mtn 
so. Mn =F)" / Onn) +98 +35 
(og = a . 
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Maximizing the denominator of (S.9.11) is identical to the maximization of the denominator of (9.6.10) 
when Z, < Y,,. We use the overall M.L.E.’s 


. _ . _ . 1 
M1 =%X%m, H2=Vns and = man ie + Sy): 


This makes A(a, y) equal to (1+ v2)~("+™/? where v is defined in (9.6.12). So A(x, y) > k if and only 
if v? < k’ for some other constant k’. This translates easily to |U| > c. 


9.7 The F Distributions 


Commentary 


The F distributions are introduced along with the F test for equality of variances from normal samples. The 
power function of the F' test is derived also. The derivation of the F' test as a likelihood ratio test is provided 
for mathematical interest. 

Those using the software R can make use of the functions df, pf, and gf which compute respectively 
the p.d.f., c.d-f., and quantile function of an arbitrary F distribution. The first argument is the argument of 
the function and the next two are the degrees of freedom. The function rf produces a random sample of F’ 
random variables. 


Solutions to Exercises 


1. The test statistic is V = [2745.7/42]/[783.9/34] = 2.835. We reject the null hypothesis if V is greater 
than the 0.95 quantile of the F' distribution with 42 and 34 degrees of freedom, which is 1.737. So, we 
reject the null hypothesis at level 0.05. 


2. Let Y =1/X. Then Y has the F distribution with 8 and 3 degrees of freedom. Also 
1 
Pr x Se) = Pr (y < ~) = 0.975. 
é 


It can be found from the table given at the end of the book that Pr(Y < 14.54) = 0.975. Therefore, 
1/c = 14.54 and c = 0.069. 


3. If Y has the ¢ distribution with 8 degrees of freedom, then X = Y? will have the F distribution with 1 
and 8 degrees of freedom. Also, 


0.3 = Pr(X >c) =Pr(Y > Ve) + Pr(Y < —Ve) = 2Pr(Y > Ve). 


Therefore, Pr(Y > Vc) = 0.15. It can be found from the table given at the end of the book that 
Pr(Y > 1.108) = 0.15. Hence; «/e = 1.108 and ¢ = 1.228. 


4. Suppose that X is represented as in Eq. (9.7.1). Since Y and Z are independent, 


E(x) = ~E (5) = ~E(Y)E (=) 
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Since Y has the x? distribution with m degrees of freedom, E(Y) = m. Since Z has the y? distribution 
with n degrees of freedom, 


i oo ] 1 oe 

E (5) = i sflejdz = IPED [ z/2)-2 exp(—z/2)dz 
— 2/2)-1TT(n/2) — 1 1 ol 
7 2”/2T\(n/2) ~ Qf(n/2)-1) an - 2 


Hence, E(X) = n/(n — 2). 


5. By Eq. (9.7.1), X can be represented in the form X = Y/Z, where Y and Z are independent and have 
identical x? distributions. Therefore, Pr(Y > Z) = Pr(Y < Z) = 1/2. Equivalently, Pr(X > 1) = 
Pr(X <1) =1/2. Therefore, the median of the distribution of X is 1. 


6. Let f(x) denote the p.d.f. of X, let W = mX/(mX + n), and let g(w) denote the p.d.f. of W. Then 


dx on 1 
xX —_ 1 — —S> = — +: ——., 
nW/|m(1 — W)] and ie on eae For0<w<1, 
nw dx 
n\(/2)-1——qy(m/2)-1 (1 — w)m+m/2 77 l 
CR) Geyer aimee — Gr) Ta ap 
1 m/2)—1 n/2)-1 
aaa 2 ayes, 
where 
, — Lim+ n)/2\m™/2nr/? 
— P(m/2)P(n/2) 
It can be seen g(w) is the p.d.f. of the required beta distribution. 
_ _ 16 . 
7. (a) Here, Xm = 84/16 = 5.25 and Y, = 18/10 = 1.8. Therefore, S? = > X? — 16(X,,) = 122 and 
i=1 


10 
Sos yy — 10(Y2) = 39.6. It follows that 
i=1 


a2 


1 1 
aS a! = 7.625 and 62= 12 = 3.96. 


If o?7 = 0%, the following statistic V will have the F distribution with 15 and 9 degrees of freedom: 
_ SP/15 
52 /9- 
(b) If the test is to be carried out at the level of significance 0.05, then Ho should be rejected if 
V > 3.01. It is found that V = 1.848. Therefore, we do not reject Ho. 


8. For any values of o7 and 03, the random variable 


S?/(1507) 
S3/(905) 
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has the F distribution with 15 and 9 degrees of freedom. Therefore, if 7? = 303, the following statistic 
V will have that F’ distribution: 

_ 82/45 

ge 19° 


As before, Ho should be rejected if V > c, where c = 3.01 if the desired level of significance is 0.05. 


. When of = 02, V has an F distribution with 15 and 9 degrees of freedom. Therefore, Pr(V > 3.77) = 


0.025, which implies that cz = 3.77. Also, 1/V has an F distribution with 9 and 15 degrees of freedom. 
Therefore, Pr(1/V > 3.12) = 0.025. It follows that Pr(V < 1/(3.12)) = 0.025, which means that 
ey = 1/(3.12) = 0.321. 


Let V be as defined in Exercise 9. If 0? = ro3, then V/r has the F distribution with 15 and 9 degrees 
of freedom. Therefore, Ho would be rejected if V/r < cy; or V/r > cz, where c; and cp have the values 
found in Exercise 9. 


For any positive number r, the hypothesis Ho in Exercise 9 will not be rejected if c; < V/r < cg. The 
set of all values of r for which Hp would not be rejected will form a confidence interval with confidence 
coefficient 0.95. But cy < V/r < cg if and only if V/cg < r < V/c,. Therefore, the confidence interval 
will contain all values of r between V/3.77 = 0.265V and V/0.321 = 3.12V . 


If a random variable Z has the y? distribution with n degrees of freedom, Z can be represented as 
the sum of n independent and identically distributed random variables Z1,...,Z,, each of which has 
a x” distribution with 1 degree of freedom. Therefore, Z/n = Y7"_) Z;/n = Zn. As n > ov, it follows 
from the law of large numbers that Z,, will converge in probability to the mean of each Z;, which is 
1. Therefore Z/n +1. It follows from Eq. (9.7.1) that if X has the F' distribution with mo and n 
degrees of freedom, then as n — oo, the distribution of X will become the same as the distribution of 
Y/mo. 


Suppose that X has the F distribution with m and n degrees of freedom, and consider the representation 


of X in Eq. (9.7.1). Then Y/m +, 1. Therefore, as m > oo, the distribution of X will become the 
same as the distribution of n/Z, where Z has a y” distribution with n degrees of freedom. Suppose that 
c is the 0.05 quantile of the y? distribution with n degrees of freedom. Then Pr(n/Z < n/c) = 0.95. 
Hence, Pr(X < n/c) = 0.95, and the value n/c should be entered in the column of the F’ distribution 
with m = oo. 


The test rejects the null hypothesis if the F’ statistic is greater than the 0.95 quantile of the F’ dis- 
tribution with 15 and 9 degrees of freedom, which is 3.01. The power of the test when o7 = 203 is 
1 — G45,9(3.01/2) = 0.2724. This can be computed using a computer program that evaluates the c.d-f. 
of an arbitrary F' distribution. 


The p-value will be the value of ag such that the observed v is exactly equal to either c, or cg. The 
problem is deciding wheter v = c, or v = c2 since, we haven’t constructed a specific test. Since c, 
and cg are assumed to be the ag/2 and 1 — ao/2 quantiles of the F' distribution with m — 1 and 
n — 1 degrees of freedom, we must have cy < cp and Gm=in—i(c1) < 1/2 and Gm_—ij;—1(ce) > 1/2. 
These inequalities allow us to choose whether v = cy or v = cg. Every v > 0 is some quantile 
of each F distribution, indeed the Gein i) quantile. If Gm_in-i(v) < 1/2, then v = cy and 
a9 = 2Gm—1,n-1(v). If Gm_1yn-1(v) > 1/2, then v = co, and ag = 2[1 — Gy_in-i(v)]. (There is 0 
probability that Gm_—1n—1(v) = 1/2.) Hence, ag is the smaller of the two numbers 2G,—1,n—-1(v) and 


OL geet mt U) | 
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In Example 9.7.4, v = 0.9491 was the observed value and 2G95,25(0.9491) = 0.8971, so this would be 
the p-value. 


The denominator of the likelihood ratio is maximized when all parameters equal their M.L.E.’s. The 


numerator is maximized when o? = 03. As in the text, the likelihood ratio then equals 


A(a,y) = dw"? (1 — w)"?, 


where w and d are defined in the text. In particular, w is a strictly increasing function of the observed 
value of V. Notice that A(x, y) < k when w < kj or w > kg. This corresponds to V < c; or V > cg. In 


order for the test to have level ag, the values c; and c2 have to satisfy Pr(V < ci) + Pr(V > c2) = ao 


when o? = 03. 


The test found in Exercise 9 uses the values cj = 0.321 and co = 3.77. The likelihood ratio test rejects 
Ho when dw®(1—w)°) < k, which is equivalent to w'(1—w)° < k/d. If V = v, then w = 15v/(15v+9). 
In order for a test to be a likelihood ratio test, the two values c; and cp must lead to the same value of 
the likelihood ratio. In particular, we must have 


( 15¢| i" (: 15cy ): _ ( 15c2 : (: 15c9 ) 

15c, + 9 154 +9/  \15e4+9 15ceg +9) ° 

Plugging the values of c; and cy from Exercise 9 into this formula we get 2.555 x 107° on the left and 
1.497 x 107° on the right. 


Let V* be defined as in (9.7.5) so that V* has the F distribution with m — 1 and n — 1 degrees of 
freedom and The distribution of V = (o?/03)V*. It is straightforward to compute 


ot o5 o5 
Pr(V < C1) = Pr ar < cq) = Pr y* < maa = GC acig tl 77 F1 5 
or a aii 


and similarly, 
o3 
Pr(V > C2) =1- ee ee me < 


(a) Apply the result of Exercise 18 with c, = Gjq199(0.025) = 0.2952 and cz = Gjp199(0.975) = 2.774 
and 03/07 =1/1.01. The result is 


G'10,20(c1 /1.01) + 1— G'10,20 (c2/1.01) a G'10,20 (0.289625) + 1— G'10,20 (2.746209) = 0.0503. 


(b) Apply the result of Exercise 18 with cy = Gjp'99(0.025) = 0.2952 and cz = Gipi99(0.975) = 2.774 
and 03/0? = 1.01. The result is 


G'10,20(1.01 x C1) +1—- G'10,20(1.01 x C2) = G'10,20(0.2954475) +1—- G'10,20 (2.80148) = 0.0498. 


(c) Since the answer to part (b) is less than 0.05 (the value of the power function for all parameters 
in the null hypothesis set), the test is not unbiased. 
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9.8 Bayes Test Procedures 


Commentary 


This section introduces Bayes tests for the situations described in the earlier sections of the chapter. It 
derives Bayes tests as solutions to decision problems in which the loss function takes only three values: 0 for 
correct decision, and one positive value for each type of error. The cases of simple and one-sided hypotheses 
are covered as are the various situations involving samples from normal distributions. The calculations are 
done with improper prior distributions so that attention can focus on the methodology and similarity with 
the non-Bayesian results. 


Solutions to Exercises 


1. In this exercise, €) = 0.9, & = 0.1, wo = 1000, and w; = 18,000. Also, 
1 1 
fo(x) = —— exp|-5 (a — 50)?| 
7 2 


and 


By the results of this section, it should be decided that the process is out of control if 


fil) . Soo _ 1 


fo(z) ~ €wi 2 
This inequality can be reduced to the inequality 27 — 102 > — log 2 or, equivalently, x > 50.653. 


2. In this exercise, £9 = 2/3, £; = 1/3, wo = 1, and w; = 4. Therefore, by the results of this section, it 
should be decided that fo is correct if 


fi(z) Z fowo _ 1 


fo(z) ~ wy, 2 


Since fi(x)/fo(x) = 4x3, it should be decided that fo is correct if 42° < 1/2 or, equivalently, if x < 1/2. 


3. In this exercise, & = 0.8, €£; = 0.2, wo = 400, and w, = 2500. Also, if we let y = )“_, a;, then 


fo(X) = SPC)» 
(x!) 
=I 
and 
p(x) — BIO) 


—__ 
i) 
i=1 

By the results of this section, it should be decided that the failure was caused by a major defect if 


FAY 7\% — §owo _ 
Fo(X) = exp( in)(=) > Ban 0.64 
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or, equivalently, if 
4n + log (0.64) 
lo (=) 
wee 


4. In this exercise, &) = 1/4, & = 3/4, and wo = w; = 1. Let 21,...,2,, denote the observed values in the 
n 


sample, and let y = S- xz;. Then 
i=1 


jo x)= Os P04 
and 
fi(X) = (0.4)¥(0.6)” ¥. 
By the results of this section, Hp should be rejected if 


fi(X) — Sowo 1 


fo(X)° Gu 3 


But 


wey (5°) GG) 25 
if and only if 


l me l PsA : 
— nm log — og — 
Y 08 9 Bz 83 


or, equivalently, if and only if 


5. (a) In the notation of this section &) = Pr(@ = 9) and f; is the p.f. or p.d.f. of X given 6 = 6;. By 
the law of total probability, the marginal p.f. or p.d.f. of X is €ofo(a) + i fi(a). Applying Bayes’ 
theorem for random variables gives us that 


fofo(@) 
PHO = Cole) = a Fol) + Efile) 
(b) The posterior expected value of the loss given X = = is 
woo fo(x) 
fofo(x) + & f(x) 
wifi fi(x) 
fofo(x) + & fi(x) 
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The tests 6 that minimize r(d) have the form 


Don’t Reject Hp if fwofo(x) > &:wifi(x), 
Reject Ho if €owofo(x) < iwi fi(x), 
Do either if Eqwo fo(ax) = uri (a). 

Notice that these tests choose the action that has smaller posterior expected loss. If neither action 


has smaller posterior expected loss, these tests can do either, but either would then minimize the 
posterior expected loss. 


(c) The “reject Hp” condition in part (b) is owofo(a) < :wifi(x). This is equivalent to wo Pr(é = 
Oo|x) < wi[1 — Pr(@ = Oo|x)]. Simplifying this inequality yields Pr(@ = O9|a@) < wi/(wo + w1). 
Since we can do whatever we want when equality holds, and since “Ho true” means 6 = 09, we see 
that the test described in part (c) is one of the tests from part (b). 


The proof is just as described in the hint. For example, (9.8.12) becomes 


[> Fe, eutoyuoteretopeteytta tes | sae |) ~ flea | 8) flea | W)]dea 


The steps after (9.8.12) are unchanged. 


. We shall argue indirectly. Suppose that there is # such that the p-value is not equal to the posterior 


probability that Ho is true. First, suppose that the p-value is greater. Let ag be greater than the 
posterior probability and less than the p-value. Then the test that rejects Hp when Pr(Hp true|x) < ag 
will reject Hp, but the level ag test will not reject Ho because the p-value is greater than ag. This 
contradicts the fact that the two tests are the same. The case in which the p-value is smaller is very 
similar. 


(a) The joint p.d.f. of the data given the parameters is 


—(m+n)/2_(m+n) /2 T . 2 . 2 
(Qa) rtm) /2-atn)/2 exay (-5 Se — H1) + Du — H2) \ : 
Use the following two identities to complete the proof of this part: 
per = pay = Se — Em) +M(Em — pa), 
i=l i=l 
do (yy — #2)? = S05 — Tn)” + Gn — Ha)” 
j=l j=l 


(b) The prior p.d.f. is just 1/r. 
i. As a function of j1, the posterior p.d.f. is a constant times exp(—mt(%m — 1)*/2), which is 
just like the p.d.f. of a normal distribution with mean Z%,,, and variance 1/(mr). 
ii. The result for 2 is similar to that for py. 
iii. As a function of (111, 2), the posterior p.d.f. looks like a constant times 
exp(—mr(Zm — 11)"/2) exp(—nr(G, — H2)"/2), 
which is like the product of the two normal p.d.f.’s from parts (i) and (ii). Hence, the con- 


ditional posterior distribution of (41, 42) given 7 is that of two independent normal random 
variables with the two distributions from parts (i) and (ii). 
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iv. We can integrate jz; and jg out of the joint p.d.f. Integrating exp(—mt(%m — [11)”) yields 
(2n)'/?27-1/2_ Integrating exp(—nr(y,, — p2)*) yields (27)!/27—1/2 also. So, the marginal 
posterior p.d.f. of r is a constant times 7°+"—?)/? exp(—0.57(s2 + s;)). This is the p.d.f. of 
a gamma distribution with parameters (m+n — 2)/2 and (s? + 3) /2, except for a constant 
factor. 


Since 4; and jz are independent, conditional on 7, we have that 4; — “2 has a normal distribution 
conditional on 7 with mean equal to the difference of the means and variance equal to the sum 
of the variances. That is, 4, — 2 has a normal distribution with mean 7, — Y,, and variance 
tT !(1/m+1/n) given r. If we subtract the mean and divide by the square-root of the variance, 
we get a standard normal distribution for the result, which is the Z stated in this part of the 
problem. Since the standard normal distribution is independent of 7, then Z is independent of 7 
and has the standard normal distribution marginally. 


Recall that 7 has a gamma distribution with parameters (m+n — 2)/2 and (s7 + s7)/2. If we 
multiply + by s2 + a the result W has a gamma distribution with the same first parameter but 
with the second parameter divided by s? + a namely 1/2. 


Since Z and W are independent with Z having the standard normal distribution and W having 
the x? distribution with m+n — 2 degrees of freedom, it follows from the definition of the t 
distribution that Z/(W/[m +n — 2])'/? has the t distribution with m+n — 2 degrees of freedom. 
It is easy to check that Z/(W/[m +n — 2])'/? is the same as (9.8.17). 


The null hypothesis can be rewritten as 7; > 72, where 7; = 1/a7. This can be further rewritten as 
T/T2 > 1. Using the usual improper prior for all parameters yields the posterior distribution of 7; 
and 72 to be that of independent gamma random variables with 7; having parameters (m — 1)/2 
and s7,/2 while 72 has parameters (n—1)/2 and s7/2. Put another way, 715% has the x? distribution 
with m—1 degrees of freedom independent of 7382 which has the y? distribution with n—1 degrees 
of freedom. This makes the distribution of 


_ 782 /(m—1) 


7282 /(n—1) 
the F distribution with m—1 and n— 1 degrees of freedom. The posterior probability that Ho is 
true is 


s?/(m—1) s?/(m—1) 
Priam 2 1)=— Pr [w > So] = 1- Gn-1n-1 | SS } . 
3/(n—1) 33/(n—1) 
The posterior probability is at most ap if and only if 
s>/(m—1) 1 
ais > Fr 1— ago). 
s2/(n _ 1) = otal ag) 


This is exactly the form of the rejection region for the level ag F' test of Ho. 


This is a special case of Exercise 7. 
Using Theorem 9.8.2, the posterior distribution of 
— pig — [5.134 — 3.990] fly — fg — 1.144 
26 +26 —2 2a — Ha ~ [5-134 — 3.990] er ~ Wp — 1.144 
( ) (1/26 + 1/26)!/2 (63.96 + 67.39)1/2 0.4495 


is the ¢ distribution with 50 degrees of freedom. 


(b) We can compute 


d—1.144 i 1044 
r(|H1 — f2| < d) so ( 0 4495 ) wo ( 0.4495 
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1.0 


Probability 


0.4 
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Figure $.9.6: Figure for Exercise 10b of Sec. 9.8. 


A plot of this function is in Fig. 5.9.6. 


First, let Hp: 0 € QQ’ and H,:0€0". Then Q =’ and Q; = 2”. Since dp is the decision that 
Hp is true we have dy = d’ and d, = d”. Since wo is the cost of type II error, and type I error is 
to choose 0 € Q” when 6 € 0’, wo = w’, and w, = w”. It is straightforward to see that everything 
switches for the other case. 


The test procedure is to 


. WI 
h d; if Pr(@eQ —— 5.9.12 
choose d, if Pr( o|a) Tee ( ) 
and choose either action if the two sides are equal. (S.9.13) 


In the first case, this translates to “choose d” if Pr(@ € Q'|a) < w”/(w’ + w”), and choose either 
action if the two sides are equal.” This is equivalent to “choose d’ if Pr(@ € |x) > w” /(w’ +w”), 
and choose either action if the two sides are equal.” This, in turn, is equivalent to “choose d’ 
if Pr(@ € QO” |x) < w'/(w’ + w”), and choose either action if the two sides are equal.” This 
last statement, in the second case, translates to (S.9.12). Hence, the Bayes test produces the 
same action (d’ or d”) regardless of which hypothesis you choose to call the null and which the 
alternative. 


9.9 Foundational Issues 


Commentary 


This section discusses some subtle issues that arise when the foundations of hypothesis testing are examined 
closely . These issues are the relationship between sample size and the level of a test and the distinction 
between statistical significance and practical significance. The term “statistical significance” is not introduced 
in the text until this section, hence instructors who do not wish to discuss this issue can avoid it altogether. 


Solutions to Exercises 


i 


(a) When pp = 0, X has the standard normal distribution. Therefore, c = 1.96. Since Ho should be 


rejected if |X| > c, then Ho will be rejected when X = 2. 
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f(X|u=0) __ exp (3%?) 
f(X|w=5) — exp[—3(X — 5)?] 
When X = 2, this likelihood ratio has the value exp(5/2) = 12.2. Also, 


1 
(b) = exp [525 - 10X)| : 
1 
f(X|m=0) __ exp (-3X") 
f(X|u=—-5) exp |-3(x + 5)?| 
When X = 2, this likelihood ratio has the value exp(45/2) = 5.9 x 10°. 


2 = expl5 (25 + 10X)]. 


2. When p: = 0, 100X,, has the standard normal distribution. Therefore, Pr (100| X,| > 1.96|u=0) = 
0.05. It follows that c = 1.96/100 = 0.0196. 


(a) When pz = 0.01, the random variable Z = 100(X,, — 0.01) has the standard normal distribution. 
Therefore, 


Pr(|X»,| <¢| = 0.01) 


Pr(—1.96 < 100X, < 1.96] u = 0.01) 
Pr(—2.96 < z < 0.96) 
0.8315 — 0.0015 = 0.8300. 
It follows that Pr(|X,| >c|u = 0.01) = 1 — 0.8300 = 0.1700. 
(b) When y = 0.02, the random variable Z = 100(X,, — 0.02) has the standard normal distribution. 


Therefore, 


Pr(| Xn, | <¢| = 0.02) 


Pr(—3.96 < Z < —0.04) 
= Pr(0.04 < Z < 3.96) = 1 — 0.5160. 
It follows that Pr(| X,| <c|s = 0.02) = 0.5160. 


3. When p = 0, 100X,, has the standard normal distribution. The calculated value of 100X, is 100(0.03) = 
3. The corresponding tail area is Pr(100 X, > 3) = 0.0013. 


4. (a) According to Theorem 9.2.1, we reject Ho if 


19 1 Le 
(Qn (-3 4 )<z mn P (-3 26 ue r). 


This inequality is ee to 
2 log(19 1 
Flog9) , 1 
n 4 
That is, c, = 2log(19)/n + 1/4. For n = 1,100,100000, the values of c, are 6.139, 0.3089, and 
0.2506. 


(b) The size of the test is 
Pr(Xn > cn|O = 0) =1— Ble x n'/?), 
For n = 1,100, 10000, the sizes are 4.152 x 107'°, 0.001, and 0. 


< En. 


5. (a) We want to choose c, so that 
19[1 — &(./ne,)] = ®(/n [ey — 0.5]). 
Solving this equation must be done numerically. For n = 1, the equation is solved for c, = 1.681. 
For n = 100, we need c, = 0.3021. For n = 10000, we need c, = 0.25 (both sides are essentially 
0). 
(b) The size of the test is 1— 6(c,n!/2), which is 0.0464 for n = 1, 0.00126 for n = 100 and essentially 
0 for n = 10000. 
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9.10 Supplementary Exercises 


Solutions to Exercises 
1. According to Theorem 9.2.1, we want to reject Hp when 
(1/2)? < (8/4) (1/4)°*. 

We don’t reject Hp when the reverse inequality holds, and we can do either if equality holds. The 
inequality above can be simplified to xz > log(8)/log(3) = 1.892. That is, we reject Ho if X is 2 or 3, 
and we don’t reject Ho if X is 0 or 1. The probability of type I error is 3(1/2)? + (1/2)? = 1/2 and the 
probability of type II error is (1/4)? + 3(1/4)?(3/4) = 5/32. 

2. The probability of an error of type 1 is 


a = Pr(Rej. Hy | Ho) = Pr(X < 5|@=0.1) =1-(.9)? =.41. 


The probability of an error of type 2 is 


@=Pr(Acc. Ho|H1) = Pr(X > 6|6 = 0.2) = (.8)° = .33. 


3. It follows from Sec. 9.2 that the Bayes test procedure rejects Hp when f(x)/fo(x) > 1. In this problem, 
fitz) = (.8)"1(.2) for = 1,2,..., 
and 
filai = (9G) for w= 1,920: 


Hence, Ho should be rejected when 2(8/9)*~! > 1 or x — 1 < 5.885. Thus, Ho should be rejected for 
X <6. 


4. It follows from Theorem 9.2.1 that the desired test will reject Ho if 


filz) _ f(@l@=0) , 


fo(z) — f(w|@ = 2) 


Notre 


In this exercise, the ratio on the left side reduces to x/(1— ax). Hence, the test specifies rejecting 
Ho if x > 1/3. For this test, 


afd) = Pr (x > 516 =2) 


4 

9 

8(6) = Pr (x <5|6=0) =>. 
Hence, a(6) + 28(6) = 2/3. 
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. It follows from the previous exercise and the Neyman-Pearson lemma that the optimal procedure 6 


specifies rejecting Hj when x/(1—2) > k’ or, equivalently, when « > k. The constant k must be chosen 
so that 


1 
a= Pr(X >k|0=2)= f (F=f 


Hence, k = 1—a'/? and 
B(5) = Pr(X <k|@=0) =k? = (1—-a1/”)?. 
(a) The power function is given by 
n(0|5) =Pr(X > 0.9/6) = [. F(x|@)dx = .19 — .098. 


(b) The size of 6 is 


sup 7(@|6) = .10. 
e>1 


. A direct calculation shows that for 6, < 62, 


<0 


d — _ _-2(01 — 2) 
~ (211 -—@)2+0)2 ~~ 


dx | f(x| 61) 


Hence, the ratio f(x|@2)/f(x|01) is a decreasing function of x or, equivalently, an increasing function 
of r(x) = —a. It follows from Theorem 9.3.1 that a UMP test of the given hypotheses will reject Ho 
when r(X) > cor , equivalently, when X < k. Hence, k must be chosen so that 


1 k 1 1 1 
5 = PE (x<kle=5)= |" s(wl6=5) av = 500 +b, or k= 5(v14—1). 
0 


. Suppose that the proportions of red, brown, and blue chips are p,, po, and ps3, respectively. It follows 


from the multinomial distribution that the probability of obtaining exactly one chip of each color is 


3! 


Ta P1P2Ps = 6p) p2p3. 


Hence, Pr(Rej. Ho | p1, p2, p3) = 1 — 6p pops. 


(a) The size of the test is 


7 
(b) The power under the given alternative distribution is Pr(Rej. Ho |1/7,2/7, 4/7) = 295/343 = .860. 


= Pr | Rej. Ho} -,-,- = 
a r( ej 0/3733 


11 j=! 


. Let f;(x) denote the p.d.f. of X under the hypotheses H;(i = 0,1). Then 


fie) _ | ee for 2 <Oorx>1, 
fo(z) |) v(a) for0<2<1, 


where y(xz) is the standard normal p.d.f. The most powerful test 6 of size 0.01 rejects Ho when 
fi(x)/fo(x) > k. Since v(x) is strictly decreasing for 0 < x < 1, it follows that 6 will reject Ho if 
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X <0, X >1, or 0 < X < c, where c is chosen so that Pr(O < X < c| Hp) = .01. Since X has a 
uniform distribution under Hp , c = .01. Thus, 6 specifies rejecting Hop if X < .01 or X > 1. The power 
of 6 under Hy is 


Pr(X < .01| Hy) + Pr(X > 1| Ay) = ©(.01) + [1 — &(1)] = 5040 + .1587 = .6627. 


The usual ¢ statistic U is defined by Eq. (9.5.2) with n = 12 and pg = 3. Because the one-sided 
hypotheses Ho and Hy, are reversed from those in (9.5.1), we now want to reject Ho if U < c. If 
Lo = 3, then U has the ¢ distribution with 11 degrees of freedom. Under these conditions, we want 
Pr(U < c) = 0.005 or equivalently, by the symmetry of the t distribution, Pr(U < —c) = 0.995. It is 
found from the table of the t distribution that —c = 3.106. Hence, Hp should be rejected if U < —3.106. 


It is known from Example 4 that the UMP test rejects Ho if Xn > c. Hence, c must be chosen so that 
0.95 = Pr(X, > c|@=1) = Pr[Z > Vn(ce— 1], 
where Z has the standard normal distribution. Hence, \/n(c — 1) = —1.645, and c = 1 — (1.645)/n!/?. 


Since the power function of this test will be a strictly increasing function of 6, the size of the test will 
be 


os 1.645 
a = sup Pr(Rej. Ho|0) = Pr(Rej. Hpo|6=0) =Pr}|X, >1- ( ‘a =o] 
0<0 n 
= Pr(Z > n'/? — 1.645), 
where Z again has the standard normal distribution. When n = 16, 


a = Pr(Z > 2.355) = .0093. 


For 01 < 49, 


8 
which is an increasing function of T’ = a x;. Hence, the UMP test specifies rejecting Ho when T' > c 
i=1 
é v7 
or, equivalently, when —25 “log X; < k. The reason for expressing the test in this final form is that 
i=l 


when 6 = 1, the observations X,,...,Xg are i.i.d. and each has a uniform distribution on the interval 
8 
(0,1). Under these conditions, —2 2 log X; has a x? distribution with 2n = 16 degrees of freedom (see 


i=l 
Exercise 7 of Sec. 8.2 and Exercise 5 of Sec. 8.9). Hence, in accordance with Theorem 9.3.1, Ho should 


8 
be rejected if -25¢ log X; < 7.962, which is the 0.05 quantile of the y? distribution with 16 degrees of 
i=1 
‘ 8 
freedom, or equivalently if log a; > —3.981. 
i=l 
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The y? distribution with 6 degrees of freedom is a gamma distribution with parameters a = 0/2 and B = 
1/2. Hence, it follows from Exercise 3 of Sec. 9.3 that the joint p.d.f. of X;,...,X, has a monotone 


nm 
likelihood ratio in the statistic T = II X;. Hence, there is a UMP test of the given hypotheses, and it 
i=1 


n 
specifies rejecting Hp when T' > c or, equivalently, when log T = y. log X; => k. 


j=1 

Let X, be the average of the four observations X;,...,X4 and let X» be the average of the six obser- 
4 10 

vations X5,..., X19. Let $2? = pee — X,)? and $2 = SOX —X>2)*. Then S?/o? and $3/o? have 
i=1 i=5 


independent x? distributions with 3 and 5 degrees of freedom, respectively. Hence, (5.97) /(3,93) has the 
desired F' distribution. 


It was shown in Sec. 9.7 that the F' test rejects Hp if V > 2.20, where V is given by (9.7.4) and 2.20 is 
the 0.95 quantile of the F distribution with 15 and 20 degrees of freedom. For any values of 0? and 03, 
the random variable V* given by (9.7.5) has the F’ distribution with 15 and 20 degrees of freedom. 
When o? = 203,V* = V/2. Hence, the power when o? = 203 is 


1 
P*(Rej. Ho) = P*(V > 2.20) = P* (Sv > 1.10) = Pr(V* > 11), 


where P* denotes a probability calculated under the assumption that a7 = 203 . 


The ratio V = S%/S% has the F distribution with 8 and 8 degrees of freedom, and so does 1/V = 
S7/5-. Thus, 


05 =P? >e)— Priv Se) + Pr/V Sc) =2 Pr Se). 


It follows that c must be the .975 quantile of the distribution of V, which is found from the tables to 
be 4.43. 


(a) Carrying out a test of size @ on repeated independent samples is like performing a sequence of 
Bernoulli trials on each of which the probability of success is a. With probability 1, a success 
will ultimately be obtained. Thus, sooner or later, Ho will ultimately be rejected. Therefore, the 
overall size of the test is 1. 


(b) As we know from the geometric distribution, the expected number samples, or trials, until a success 
is obtained is 1/a. 


If U is defined as in Eq. (8.6.9), then the prior distribution of U is the ¢ distribution with 2a9 = 2 
degrees of freedom. Since the ¢ distribution is symmetric with respect to the origin, it follows that 
under the prior distribution, Pr(Ho) = Pr(uw < 3) = Pr(U < 0) = 1/2. It follows from (8.6.1) and 
(8.6.2) that under the posterior distribution, 


3 + (17)(3.2) 


= —— = 3.189 Ay= 1 
Hy lal? , Ar=18, 
17 
Qn = 1 + 2 = 9.5, 
1 (17)(.04) 
= 1+-(17 = 9.519. 
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If we now define Y to be the random variable in Eq. (8.6.12) then Y = (4.24)(j — 3.19) and Y has the 
t distribution with 2a; = 19 degrees of freedom. Thus, under the posterior distribution, 


Pr(Ho) = Pr(w < 3) = PrlY < (4.24)(3 — 3.19)] = Pr(Y < —.81) = Pr(Y > .81). 


It is found from the table of the ¢ distribution with 19 degrees of freedom that this probability is 
approximately 0.21. 


At each point 6 € 2;,7(0@| 6) must be at least as large as it is at any point in Qo, because 6 is unbiased. 
But sup 7(0|6) =a, at every point 0EQ,. 
ENO 


Since 6 is unbiased and has size a, it follows from the previous exercise that 7(@|6) < a for all @ inside 
the circle A and 7(@| 6) > a for all 0 outside A. Since 7(@| 6) is continuous, it must therefore be equal 
to @ everywhere on the boundary of A. Note that this result is true regardless of whether all of any 
part of the boundary belongs to Hp or Ay. 


Since Hp is simple and 6 has size a, then 7(99|6) = a. Since 6 is unbiased, 7(0| 6) > q@ for all other 
values of 6. Therefore, 7(@|6) is a minimum at 0 = 09. Since 7 is assumed to be differentiable, it 
follows that 7’ (9 | 6) = 0. 


(a) We want Pr(X > c,| Ho) = Pr(Y > co| Ho) = .05. Under Ho, both X and Y/10 are standard 
normal. Therefore, cy = 1.645 and co = 16.45. 


(b) The most powerful test of a size ag, conditional on observing X with a variance of a? is to reject 
Ho if X > o&~1(1—ao). In this problem we are asked to find two such tests: one with o = 1 and 
ap = 2.0 x 10~" and the other with ¢ = 10 and ap = 0.0999998. The resulting critical values are 

@-1(1-2.0x 10-7) = 5.069, 
10@~1(1 — 0.0999998) = 12.8155. 


(c) The overall size of a test in this problem is the average of the two conditional sizes, since the 
two types of meteorological conditions have probability 1/2 each. In part (a), the two conditional 
sizes are both 0.05, so that is the average as well. In part (b), the average of the two sizes is 
(2.0 x 10-7 + 0.0999998) /2 = 0.05 also. The powers are also the averages of the two conditional 
powers. The power of the conditional size ag test with variance a? is 


1— 6(c6~1(1 — ag) — 10). 
The results are tabulated below: 


Part Good Poor Average 


(a) 1 0 0.5 
(b) | 0.9999996 0.002435 | 0.5012 


(a) The data consist of both X and Y, where X is defined in Exercise 22 and Y = 1 if meteorological 
conditions are poor and Y = 0 if not. The joint p.f./p.d.f. of (X,Y) given O = 6 is 


1 l-y 2 y 2 
The Bayes test will choose Hp when 


i a (te _ 2) 
2(27)1/2 104 2 200 


1 l-y 2 Yy 2 
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It will choose H; when the reverse inequality holds, and it can do either when equality holds. This 
inequality can be rewritten by splitting according to the value of y. That is, choose Ho if 


«<5 + log(wofo/(wigi))/10 if y = 0, 
u<do+ 10 log(wo&o/(w1é1)) ify=1. 


In order for a test to be of the form of part (a), the two critical values co and c; used for y = 0 and 
y = 1 respectively must satisfy c; — 5 = 100(co — 5). In part (a) of Exercise 22, the two critical 
values are co = 1.645 and cy = 16.45. These do not even approximately satisfy cj —5 = 100(cp —5). 


In part (b) of Exercise 22, the two critical values are co = 5.069 and cj = 12.8155. These 
approximately satisfy cy — 5 = 100(co — 5). 

The Poisson distribution has M.L.R. in Y, so rejection Hp when Y < c is a UMP test of its size. 
With c = 0, the size is Pr(Y = 0/0 = 1) = exp(—n). 

The power function of the test is Pr(Y = 0|@) = exp(—n6). 


25. Let J be the random interval that corresponds to the UMP test, and let J be a random interval that 
corresponds to some other level ap test. Translating UMP into what it says about the random interval 
I compared to J, we have for all 0 >c¢ 


Ino 


Pri(c € I|@) < Price J|6). 


ther words, the observed value of J is a uniformly most accurate coefficient 1—ap confidence interval 


if, for every random interval J such that the observed value of J is a coefficient 1— ag confidence interval 


and 


for all 02 > 04, 


Pr(0, € | = 02) = Pr(0, € J\0 = 62). 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Chapter 10 


Categorical Data and Nonparametric 
Methods 


10.1 Tests of Goodness-of-Fit 


Commentary 


This section ends with a discussion of some issues related to the meaning of the y? goodness-of-fit test for 
readers who want a deeper understanding of the procedure. 


Solutions to Exercises 


1. Let Y = Nj, the number of defective items, and let 9 = p,, the probability that each item is defective. 
The level ap test requires us to choose c, and cg such that Pr(Y < c,|6 = 0.1) + Pr(Y > c|@ = 0.1) 
is close to ag. We can compute the probability that Y = y for each y = 0,...,100 and arrange the 
numbers from smallest to largest. The smallest values correspond to large values of y down to y = 25, 
then some values corresponding to small values of y start to appear in the list. The sum of the values 
reaches 0.0636 when c, = 4 and cp = 16. So ag = 0.0636 is the smallest ag for which we would reject 
Ho : 6 =0.1 using such a test. 


2. 
k 2 k 2 k 2 
(N; — n/k) k 9 .,n n k 9 n 
= = Nf -— 2 N; == N7 -— 2— Nj 
0 > n/k =e pee Ep ate Ee a 
k(& n n? k 
= 25 neo a a eye) 


3. We obtain the following frequencies: 


a 0 1 2 3 4 5 6 7 8 9 
N; 25 16 19 20 20 22 24 15 14 25 


Since P? = 1/10 for every value of i, and n = 200, we find from Eq. (10.1.2) that Q = 7.4. If Q has 
the y? distribution with 9 degrees of freedom, Pr(Q > 7.4) = 0.6. 
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4. We obtain the following table: 


Ni 10 10 4 

np? 6 12 6 
It is found from Eq. (10.1.2) that Q = 11/3. If Q has a x? distribution with 2 degrees of freedom, then 
the value of Pr(Q > 11/3) is between 0.1 and 0.2. 


5. (a) The number of successes is nX,, and the number of failures is n(1 — X,,). Therefore, 


Q (nXn = npo)” + [n(1 = ey = n(1 _ po)|” 
npo n(1 — po) 
ee ee eee 
= (Kn — po)? (+ 
= n(Xn — po)? 
po(1 — po) 


(b) If p = po, then E(X,,) = po and Var(X,,) = po(1—po)/n. Therefore, by the central limit theorem, 
the c.d.f. of 
Xi — Po 
[po(1 — po)/nj'/? 
converges to the c.d.f. of the standard normal distribution. Since Q = Z?, the c.d.f. of Q will 
converge to the c.d.f. of the y? distribution with 1 degree of freedom. 


6. Here, po = 0.3, n = 50, and X,, = 21/50. By Exercise 5, Q = 3.44. If Q has a x? distribution with 1 
degree of freedom, then Pr(Q > 3.4) is slightly greater than 0.05. 


7. We obtain the following table: 


O0<2<02 02<27<05 05<27<08 O08<a2< 1. 
N; 391 490 580 339 
np? 360 540 540 360 


If Q has a y? distribution with 3 degrees of freedom, then Pr(Q > 11.34) = 0.01. Therefore, we should 
reject Ho if Q > 11.34. It is found from Eq. (10.1.2) that Q = 11.5. 


8. If Z denotes a random variable having a standard normal distribution and X denotes the height of a 
man selected at random from the city, then 


Pr (X < 66 

Pr (66 < X < 67.5 
Pr (67.5 < X < 68.5 
Pr (68.5 < X < 70 


= Pr(Z < —2) = 0.0227, 
Pr(—2 < Z < —0.5) = 0.2858, 
Pr (—0.5 < Z < 0.5) = 0.3830, 
( 
( 


= Pr(0.5 < Z <2) = 0.2858, 
= Pr(Z> 2) = 0.0227. 


Ma 5S RS Ns 


Therefore, we obtain the following table: 
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N; np? 

xr < 66 18 11.35 
66<2<67.5 177 142.9 
67.5< 2 < 68.5 198 191.5 
68.5<x2<70 102 142.9 
o> 70 5 11.35 


It is found from Eq. (10.1.2) that Q = 27.5. If Q has a x? distribution with 4 degrees of freedom, then 
Pr(Q > 27.5) is much less than 0.005. 


9. (a) The five intervals, each of which has probability 0.2, are as follows: 
(—oo, —0.842), (—0.842, —0.253), (—0.253, 0.253), (0.253, 0.842), (0.842, 00). 


We obtain the following table: 


—oo < x4 < —0.842 15 10 
—0.842 <x < —0.253 10 10 
—0.253 < x@ < 0.253 7 10 
0.253 < x < 0.842 12 10 
0.842 <x%< co 6 10 


The calculated value of Q is 5.4. If Q has a yx? distribution with 4 degrees of freedom, then 
Pr(Q > 5.4) = 0.25. 


(b) The ten intervals, each of which has probability 0.1, are as given in the following table: 


Ni np? 

—oo < x@ < —1.282 8 
—1.282 < x < —0.842 7 
—0.842 < x < —0.524 3 
—0.524 < x < —0.253 7 
—0.2538 < x < 0 3) 
0< 2 < 0.253 2 
0.253 < x& < 0.524 5 
0.524 < x < 0.842 7 
0.842 <x < 1.282 2 
1.282 <%< co 4 


Ot Ot Ot OT OT OT OT OT Ot OT 


The calculated value of Q is 8.8. If Q has the y? distribution with 9 degrees of freedom, then the 
value of Pr(Q > 8.8) is between 0.4 and 0.5. 


10.2 Goodness-of-Fit for Composite Hypotheses 


Commentary 


The maximization of the log-likelihood in Eq. (10.2.5) could be performed numerically if one had appropriate 
software. The R functions optim and nlm can be used as described in the Commentary to Sec. 7.6 in this 
manual. 
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Solutions to Exercises. 


1. There are many ways to perform a y? test. For example, we could divide the real numbers into the 
intervals (—oo, 15], (15, 30], (30, 45], (45, 60], (60, 75], (75, 90], (90,00). The numbers of observations in 
these intervals are 14, 14, 4, 4, 3, 0, 2 


(a) The M.L.E.’s of the parameters of a normal distribution are f/ = 30.05 and o2 = 537.51. Using 
the method of Chernoff and Lehmann, we compute two different p-values with 6 and 4 degrees 
of freedom. The probabilities for the seven intervals are 0.2581, 0.2410, 0.2413, 0.1613, 0.0719, 
0.0214, 0.0049. The expected counts are 41 times each of these numbers. This makes Q = 24.53. 
The two p-values are both smaller than 0.0005. 


(b) The M.L.E.’s of the parameters of a lognormal distribution are fi = 3.153 and o2 = 0.48111. Using 
the method of Chernoff and Lehmann, we compute two different p-values with 6 and 4 degrees 
of freedom. The probabilities for the seven intervals are 0.2606, 0.3791, 0.1872, 0.0856, 0.0407, 
0.0205, 0.0261. The expected counts are 41 times each of these numbers. This makes Q = 5.714. 
The two p-values are both larger than 0.2. 


2. First, we must find the M.L.E. of 0. From Eq. (10.2.5), ignoring the multinomial coefficient, 


4 
L(0) _ Ip" — CahenarsNer Ney _ py rors Mis eNa TNs: where C = 4N16N2 43. 


i=0 
Therefore, 
log L(@) = log C+ (Ni + 2No + 3N3 + 4N4) log 6 + (4No + 3N1 + 2No + No) log(1 — 8). 
By solving the equation 0 log L(@)/00 = 0, we obtain the result 


6= Ni +2No+3N3+4Nq | Ni +2N2+3N3+4N4 
4(No + Ny + No + N3 + N4) 4n , 


It is found that © = 0.4. Therefore, we obtain the following table: 


No. of 

Games N; N7;(0) 
0 33 25.92 
1 67 69.12 
2 66 69.12 
3. (15 30.72 
4 19 5.12 


It is found from Eq. (10.2.4) that Q = 47.81. If Q has a x? distribution with 5 — 1 — 1 = 3 degrees of 
freedom, then Pr(Q > 47.81) is less than 0.005. 


3. (a) It follows from Eqs. (10.2.2) and (10.2.6) that (aside from the multinomial coefficient) 
log L(9) = (Na+ N5 + Ne) log2+ (2Ni + Ng + N5) log 01 + (2No + Ng + Neo) log O2 
+(2N3 + Ns + Ne) log(1 — 61 — 62). 
By solving the equations 


O log L(@) =f snd O log L (8) 


OO, a 
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we obtain the results 


R 2Ni1+Ni4+N; R 2No + Na+ Ni 
Qe ad Ge 

2n 2n 
where n = >-°_, Nj. 


(b) For the given values, n = 150,09; = 0.2, and @2 = 0.5. Therefore, we obtain the following table: 


a ae oe 
2 36 37.5 
3. «i414 13.5 
4 36 30 
5 20 18 
6 42 45 


It is found from Eq. (10.2.4) that Q = 4.37. If Q has the x? distribution with 6 —1—2= 3 degrees 
of freedom, then the value of Pr(Q > 4.37) is approximately 0.226. 


4. Suppose that X has the normal distribution with mean 67.6 and variance 1, and that Z has the standard 
normal distribution. Then: 


= Pr(X < 66) = Pr(Z < —1.6) = 0.0548, 

= Pr(66 < X < 67.5) = Pr(-1.6 < Z < —0.1) = 0.4054, 
Pr (67.5 < X < 68.5) = Pr(—0.1 < Z < 0.9) = 0.3557, 
Pri6p.b< X < 70) =Pr(0.o< 2 < 24) = (011759, 

= Pr(x > 70) =Pr(Z > 2.4) = 0.0082. 


Ss) 
w 

A ee 
II 


Therefore, we obtain the following table: 


1 18 27.4 
2 177 202.7 
3. 198 177.85 
4 102 87.95 
5 OO Al 


The value of Q is found from Eq. (10.2.4) to be 11.2. Since pu and o? are estimated from the original 
observations rather than from the grouped data, the approximate distribution of Q when Ho is true lies 
between the y? distribution with 2 degrees of freedom and a y? distribution with 4 degrees of freedom. 


5. From the given observations, it is found that the M.L.E. of the mean © of the Poisson distribution is 


0 = X, = 1.5. From the table of the Poisson distribution with O = 1.5, we can obtain the values of 


m;(O). In turn, we can then obtain the following table: 


No. of tickets Nj; n7;(O) 


0 52 44.62 
1 60 66.94 
2 55 ~—- 80.20 
3 18 25.10 
4 8 9.42 


5 or more 7 3.70 
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It is found from Eq. (10.2.4) that Q = 7.56. Since O is calculated from the original observations rather 
than from the grouped data, the approximate distribution of Q when Hp is true lies between the y? 
distribution with 4 degrees of freedom and the x? distribution with 5 degrees of freedom. The two 
p-values for 4 and 5 degrees of freedom are 0.1091 and 0.1822. 


. The value of © = X,, can be found explicitly from the given data, and it equals 3.872. However, before 


carrying cut the y? test, the observations in the bottom few rows of the table should be grouped together 
to obtain a single cell in which the expected number of observations is not too small. Reasonable choices 
would be to consider a single cell for the periods in which 11 or more particles were emitted (there 
would be 6 observations in that cell) or to consider a single cell for the periods in which 10 or more 
particles were emitted (there would be 16 observations in that cell). If the total number of cells after 
this grouping has been made is k, then under Ho the statistic Q will have a distribution which lies 
between the x? distribution with k — 2 degrees of freedom and the y? distribution with k — 1 degrees 
of freedom. For example, with k = 12, the expected cell counts are 


54.3, 210.3, 407.1, 525.3, 508.4, 393.7, 254.0, 140.5, 68.0, 29.2, 11.3, 5.8 


The statistic Q is then 12.96. The two p-values for 10 and 11 degrees of freedom are 0.2258 and 0.2959. 


. There is no single correct answer to this problem. The M.L.E.’s fj = X,, and 6? = S$?/n should be 


calculated from the given observations. These observations should then be grouped into intervals and 
the observed number in each interval compared with the expected number in that interval if each of 
the 50 observations had the normal distribution with mean X,, and variance $2/n. If the number of 
intervals is k, then when Hp is true, the approximate distribution of the statistic Q will lie between the 
x? distribution with k — 3 degrees of freedom and the y? distribution with k — 1 degrees of freedom. 


. There is no single correct answer to this problem. The M.L.E. B = 1/X,, of the parameter of the 


exponential distribution should be calculated from the given observations. These observations should 
then be grouped into intervals and the observed number in each interval compared with the expected 
number in that interval if each of the 50 observations had an exponential distribution with parameter 
1/X. If the number of intervals is k, then when Hp is true, the approximate distribution of the statistic 
Q will lie between a x? distribution with k — 2 degrees of freedom and the y? distribution with k — 1 
degrees of freedom. 


10.3. Contingency Tables 


Solutions to Exercises. 


1. Table $.10.1 contains the expected counts for this example. The value of the y? statistic Q calculated 


Table $.10.1: Expected cell counts for Exercise 1 of Sec. 10.3. 


Good grades | Athletic ability 
73 


from these data is Q = 21.5. This should be compared to the y? distribution with two degrees of 
freedom. The tail area can be calculated using statistical software as 2.2 x 107°. 
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C ry \2 R 2 R 2 
Nie Bex Nz a Né. 
2 j= “1 id) = SF - 28s + £3) - bee | ake 
i=1 j=1 Ei; i=1 j=1 Ei i=1 j=1 Eij 
R C yj 
2) es 
it jai Fj 


3. By Exercise 2, 


N2 N2 
Q= + a1 + a2 


i=1 


But 


> Na _ s (Ni+-Na? _ Nit Mien 


a ae Sh Ej2 i=l 


Ejx2 I= 1 


In the first two sums on the right, we let Eis = NjiNi2/n, and in the third sum we let Bis = 
N21 /N41. We then obtain 


a NZ, A an — Nes 


» =5— >) Ne- ya += 


= 7 Ej Ny 5=1 +2: 4 2) pay Ejxx N42 N42 Ni2 f=] Ei 


It follows that 


Nat\ & N2 n 
Q= (14 FE) ob + Fe - 2 - Naa). 
i=1 


4. The values of Ei; are as given in the following table: 


8 32 
12 48 


The value of Q is found from Eq. (10.3.4) to be 25/6. If Q has a x? distribution with 1 degree of 
freedom, then Pr(Q > 25/6) lies between 0.025 and 0.05. 


5. The values of Ei; are as given in the following table. 


77.27 94.35 49.61 22.77 
17.73 21.65 11.389 5.23 


The value of Q is found from Eq. (10.3.4) to be 8.6. If Q has the y? distribution with (2—1)(4—1) =3 
degrees of freedom, then Pr(Q > 8.6) lies between 0.025 and 0.05. 


6. The values of Bi; are as given in the following table: 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


322 Chapter 10. Categorical Data and Nonparametric Methods 


7.0 7.9 
14.5 14.5 


The value of Q is found from Eq. (10.3.4) to be 0.91. If Q has the x? distribution with 1 degree of 
freedom, then Pr(Q > 0.91) lies between 0.3 and 0.4. 


7. (a) The values of pj, and p,; are the marginal totals given in the following table: 


0.3 
0.3 
0.4 
0.5 03 0.2 1.0 


It can be verified that p;; = pi,p+; for each of the 9 entries in the table. It can be seen in advance 
that this relation will be satisfied for every entry in the table because it can be seen that the 
three rows of the table are proportional to each other or, equivalently, that the three columns are 
proportional to each other. 


(b) Here is one example of a simulated data set 


92 
85 


123 
152. 90 58 300 


(c) The statistic Q calculated by any student from Eq. (10.3.4) will have the e distribution with 
(3 — 1)(3 — 1) = 4 degrees of freedom. For the data in part (b), the table of £;; values is 


The value of Q is then 2.105. The p-value 0.7165. 


8. To test whether the values obtained by n different students form a random sample of size n from a y? 
distribution with 4 degrees of freedom, follow these steps: (1) Partition the positive part of the real line 
into k intervals; (2) Determine the probabilities p?,...,p? of these intervals for the x? distribution with 
4 degrees of freedom; (3) Calculate the value of the statistic Q given by Eq. (10.1.2). If the hypothesis 
Hp is true, this statistic Q will have approximately the y? distribution with k — 1 degrees of freedom. 


9. Let Nijx denote the number of observations in the random sample that fall into the (7, j,k) cell, and let 


C T 

Niex = 22, en Nec s  ee 
j=lk=1 i=l k=1 
RC 

Nyie = I> Nie: 
i=1 j=1 


Then the M.L.E.’s are 


Ni++ N43 Ni+k 


a +j+ a 


. j 
elon aa k 
Dit+ iy ee mee 


n 
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Therefore, when Hp is true, 


- ae . Nit +N4j4N4+k 
Figk = 2Pit4P+j4+Pt++k = = 
Since 7%, fi44 = ~$18 = y-h-1 b+ +k = 1, the number of parameters that have been estimated 
is (R—1)+(C—1)+(T-1) = R+C+4+T-—83. Therefore, when Ho is true, the approximate distribution 
of 
R C T — Doge 
= Nigh ~ Fij 
g=d yy Bae 
#=1 f=1k=1 ijk 


will be the x? distribution for which the number of degrees of freedom is RCT —1—(R+C+T-—3) = 
RCT-R-C-T+2 


10. The M.L.E.’s are 
Ni+k 


; Nees. ; 
bij =—— and Prye= 
n 
Therefore, when Ho is true, 


Nij+ Nath 


Hijk = 2Bij+ Prtk = ; 


RC 
Since & XH Dij+ =y p++k = 1, the number of parameters that have been estimated is (RC — 1) + 
’R 


C+T- "2. Therefore, when Hp is true, the approximate distribution of 


will be the y? distribution for which the number of degrees of freedom is RCT — 1— (RC +T—2) = 
BOL = he = 7 1, 


10.4 Tests of Homogeneity 


Solutions to Exercises. 


1. Table $.10.2 contains the expected cell counts. The value of the x? statistic is Q = 18.8, which should 


Table $.10.2: Expected cell counts for Exercise 1 of Sec. 10.4. 


Popularity 
Rural 0 
Suburban 15 
Urban 5.5 


be compared to the y? distribution with four degrees of freedom. The tail area is 8.5 x 107. 
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. The value of the statistic Q given by Eqs. (10.4.3) and (10.4.4) is 7.57. If Q has a x? distribution with 


(2 — 1)(3 — 1) = 2 degrees of freedom, then Pr(Q > 7.57) < 0.025. 


. The value of the statistic Q given by Eqs. (10.4.3) and (10.4.4) is 18.9. If Q has the y? distribution 


with (4— 1)(5 — 1) = 12 degrees of freedom, then the value of Pr(Q > 18.9) lies between 0.1 and 0.05. 


. The table to be analyzed is as follows: 


Person Hits Misses 


1 8 9 
2 4 12 
3 7 3 
4 13 11 
5 10 6 


The value of the statistic Q given by Eqs. (10.4.3) and (10.4.4) is 6.8. If Q has the y? distribution with 
(5 — 1)(2— 1) = 4 degrees of freedom, then the value of Pr(Q > 6.8) lies between 0.1 and 0.2. 


. The correct table to be analyzed is as follows: 


Supplier Defectives Nondefectives 


ab 1 14 
2 7 8 
3 7 8 


The value of Q found from this table is 7.2. If Q has the y? distribution with (3 — 1)(2—1) = 2 degrees 
of freedom, then Pr(Q > 7.2) < 0.05. 


. The proper table to be analyzed is as follows: 


After demonstration 
Hit Miss 
Before Hit 27 
demonstration Miss 73 
35 65 


Although we are given the marginal totals, we are not given the entries in the table. If we were told 
the value in just a single cell, such as the number of students who hit the target both before and after 
the demonstration, we could fill in the rest of the table. 


. The proper table to be analyzed is as follows: 


After meeting 
Favors A Favors B No preference 
Favors A 
Before Favors B 
meeting No preference 


Each person who attended the meeting can be classified in one of the nine cells of this table. If a speech 
was made on behalf of A at the meeting, we could evaluate the effectiveness of the speech by comparing 
the numbers of persons who switched from favoring B or having no preference before the meeting to 
favoring A after the meeting with the number who switched from favoring A before the meeting to one 
of the other positions after the meeting. 
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10.5 Simpson’s Paradox 


Solutions to Exercises 


1. If population IT has a relatively high proportion of men and population I has a relatively high proportion 
of women, then the indicated result will occur. For example, if 90 percent of population II are men and 
10 percent are women, then the proportion of population II with the characteristic will be (.9)(.6) + 
(.1)(.1) = .55. If 10 percent of population I are men and 90 percent are women, then the proportion of 
population I with the characteristic will be only (.1)(.8) + (.9)(.3) = .35. 


2. Each of these equalities holds if and only if A and B are independent events. 


3. Assume that Pr(B|A) = Pr(B|A°). This means that A and B are independent. According to the law 
of total probability, we can write 


Pr(I|B) = Pr(I|AN B)Pr(A|B) + Pr(I|A°N B) Pr(A‘|B) 
= Pr(I|AN B) Pr(A) + Pr(I|A°N B) Pr(A®), 


where the last equality follows from the fact that A and B are independent. Similarly, 
Pr(J|B°) = Pr(I|AN B°) Pr(A) + Pr(J|A° nm B®) Pr(A°). 


If the first two inequalities in (10.5.1) hold then the weighted average of the left sides of the inequalities 
must be larger than the same weighted average of the right sides. In particular, 


Pr(J|A A B) Pr(A) + Pr(Z|A® B) Pr(A®) > Pr(I|AM B®) Pr(A) + Pr(I|A° Nn B®) Pr(A°). 


But, we have just shown that this last equality is equivalent to Pr(J|B) > Pr(Z|B°), which means that 
the third inequality cannot hold if the first two hold. 


4. Define A to be the event if that a subject is a man, A° the event that a subject is a woman, B the 
event that a subject receives treatment I, and B° the event that a subject receives treatment II. Then 
the relation to be proved here is precisely the same as the relation that was proved in Exercise 2 in 
symbols. 


5. Suppose that the first two inequalities in (10.5.1) hold, and that Pr (A|B) = Pr (A|B°), Then 


Pr (I|B) 


II 
ae) 
4 


I| AN B)Pr(A| B)+Pr(I| A°NB)Pr(A° | B) 
I | ANB’) Pr(A| B) +Pr(I| A°n B°) Pr(A° | B) 
I | AN B°)Pr(A| B°) +Pr(I| Aon B°) Pr (A? | B°) 
I| B°). 


I 
Uv 
‘z 


Hence, the final inequality in (10.5.1) must be reversed. 


6. This result can be obtained if the colleges that admit a relatively small proportion of their applicants 
receive a relatively large proportion of female applicants and the colleges that admit a relatively large 
proportion of their applicants receive a relatively small proportion of female applicants. As a specific 
example, suppose that the data are as given in the following table: 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


326 Chapter 10. Categorical Data and Nonparametric Methods 


Proportion 
of total Proportion Proportion 
University Proportion Proportion of males. of females 
College applicants male female admitted admitted 

1 wl 9 wl 32 56 

2 1 9 wl 32 56 

3 2 8 2 “2 .56 

4 2 8 2 2 06 

5 4 1 9 .05 10 


This table indicates, for example, that College 1 receives 10 percent of all the applications submitted 
to the university, that 90 percent of the applicants to College 1 are male and 10 percent are female, 
that 32 percent of the male applicants to College 1 are admitted, and that 56 percent of the female 
applicants are admitted. It can be seen from the last two columns of this table that in each college the 
proportion of females admitted is larger than the proportion of males admitted. However, in the whole 
university, the proportion of males admitted is 


(.1)(.9)(.32) + (.1)(.9) (32) + (.2)(.8)(. 
1 


and the proportion of females admitted is 


(DCD (56) + CDCD(-56) + (-2)(2) )(-2)(-56) + (-4)(-9)(10) _ 
(ICL) + C1)C1) + 62)¢- 2) + (.4)(-9) i 


7. (a) Table $.10.3 shows the proportions helped by each treatment in the four categories of subjects. 
The proportion helped by Treatment IT is higher in each category. 


2)(.56) + (2 
)(.2) + (.2)¢ 


Table S.10.3: Table for Exercise 7a in Sec. 10.5. 
Proportion helped 


Category Treatment I Treatment IT 
Older males .200 .667 
Younger males .750 .800 
Older females .167 .286 
Younger females .500 .640 


(b) Table $.10.4 shows the proportions helped by each treatment in the two aggregated categories. 
Treatment I helps a larger proportion in each of the two categories 


Table $.10.4: Table for Exercise 7b in Sec. 10.5. 
Proportion helped 


Category Treatment I Treatment II 
Older subjects 433 .400 
Younger subjects 700 .667 


(c) When all subjects are grouped together, the proportion helped by Treatment I is 200/400 = 0.5, 
while the proportion helped by Treatment II is 240/400 = 0.6. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 10.6. Kolmogorov-Smirnov Tests 327 


10.6 Kolmogorov-Smirnov Tests 


Commentary 


This section is optional. However, some of the topics discussed here are useful in Chapter 12. In particular, the 
bootstrap in Sec. 12.6 makes much use of the sample c.d.f. Some of the plots done after Markov chain Monte 
Carlo also make use of the sample c.d.f. The crucial material is at the start of Sec. 10.6. The Glivenko-Cantelli 
lemma, together with the asymptotic distribution of the Kolmogorov-Smirnov test statistic in Table 10.32 
are useful if one simulates sample c.d.f.’s and wishes to compute simulation standard errors for the entire 
sample c.d.f. 

Empirical c.d.f.’s can be computed by the R function ecdf. The argument is a vector of data values. The 
result is an R function that computes the empirical c.d.f. at its argument. For example, if x has a sample 
of observations, then empd.x=ecdf (x) will create a function empd.x which can be used to compute values 
of the empirical c.d.f. For example, empd.x(3) will be the proportion of the sample with values at most 3. 
Kolmogorov-Smirnov tests can be performed using the R function ks.test. The first argument is a vector of 
data values. The second argument depends on whether one is doing a one-sample or two-sample test. In the 
two-sample case, the second argument is the second sample. In the one-sample case, the second argument 
is the name of a function that will compute the hypothesized c.d.f. If that function has any additional 
arguments, they can be provided next or named explicitly later in the argument list. 


Solutions to Exercises. 


1. F(x) = 0 for x < y, and F,,(y1) = 0.2. Suppose first that F'(y1) > 0.1. Since F is continuous, the values 
of F(x) will be arbitrarily close to F'(y1) for x arbitrarily close to y;. Therefore, sup |F;,() — F(x)| = 
L<Y1 


F(y1) > 0.1, and it follows that D, > 0.1. Suppose next that F(yi) < 0.1. Since F,(yi) = 0.2, it 
follows that | F,(y1) — F(y1) | > 0.1. Therefore, it is again true that D, > 0.1. We can now conclude 
that it must always be true that D, > 0.1. If the values of F'(y;) are as specified in the second part of 
the exercise, fori = 1,...,5, then: 


Fa) 04 for 2 < y1, 
0.2—-0.1=0.1 for x = yj, 
| F(x) -0.2|<01 fory, <2 < y, 


| Fn(x) — F(a) | = 0.4-0.3 = 0.1 for x = yo, 
| F(z) -0.4|<0.1 for y <2 < ys, 
etc. 
Hence, Dy, = sup | F(x) — F(x) | =0.1. 


-o<zr< wo 


0 forx <y, 
0.2 for y, <2 < yo, 
0.4 for y2<ax < ys, 
0.6 for y3 <a < ya, 
0.8 for yg4< ax < ys, 
1 for x > ys. 


2. F(z) = 


If F' satisfies the inequalities given in the exercise, then | F,,(a) — F(x) | < 0.2 for every value of z. 
Hence, D, < 0.2. Conversely, if F'(y;) > 0.22 for some value of i, then F(x) — F,(x) > 0.2 for values 
of x approaching y; from below. Hence, D,, > 0.2. Also, if F'(y;) < 0.2(i — 1) for some value of i, then 
Fin(yi) — F (yi) > 0.2. Hence, again D,, > 0.2. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


328 Chapter 10. Categorical Data and Nonparametric Methods 


3. The largest value of the difference between the sample c.d.f. and the c.d.f. of the normal distribution 
with mean 3.912 and variance 0.25 occurs right before « = 4.22, the 12th observation. For x just below 
4.22, the sample c.d.f. is F(x) = 0.48, while the normal c.d.f. is ([4.22 — 3.912]/0.5) = 0.73. The 
difference is D* = 0.25. The Kolmogorov-Smirnov test statistic is 23!/? x 0.25 = 1.2. The tail area can 
be found from Table 10.32 as 0.11. 


4. When the observations are ordered, we obtain Table 8.10.5. The maximum value of | F,(a#) — F(z) | 


Table S.10.5: Table for Exercise 4 in Sec. 10.6. 


i =F) Frlvi) i w%=FO) Fil) 
1 .O1 04 14 Al 56 
2 06 08 15 42 60 
3 .08 12 16 48 64 
4 09 16 17 57 68 
i) wd .20 18 66 72 
6 16 24 19 71 76 
7 22 28 20 75 80 
8 23 .o2 21 78 84 
9 29) 36 22 79 88 

10 310) 40 23 82 92 

1 30 44 24 88 96 

12 38 48 25 90 1.00 

13 40 52 


occurs at x = y15 where its value is 0.60 — 0.42 = 0.18. Since n = 25, n!/2 D,,* = 0.90. From 
Table 10.32, H(0.90) = 0.6073. Therefore, the tail area corresponding to the observed value of D,,* is 
1 — 0.6073 = 0.3927. 


5. Here, 


an for O-< a 1/2, 
F(e)=4 } 


1 
gl +2) for }<a<l. 


Therefore, we obtain Table $.10.6. The supremum of | F,,(”) — F(x) | occurs as x + yg from below. 


Table S.10.6: Table for Exercise 5 in Sec. 10.6. 


+ o FQ) Fal) by. FOG) Fay) 
ik O01 .015 04 14. «Al 615 56 
2 06 .09 .08 15 .42 .63 60 
3 08 .12 12 16 .48 .72 64 
4 09 .135 16 17 .57 ~~ .785 68 
5 11 .165 .20 18 66 .83 72 
6 16 .24 24 19 .71  .855 76 
7 22 33 .28 20.75 1875 80 
8 .23 .345 .32 21 = .78 ~~ = .89 84 
9 29 .435 .36 22 = .79 ~~ 895 88 
10.80 .45 40 23 «8291 92 
11.85) 1525 44 24 88 9 9 
12 .88 .57 A8 25 .90 .95 1.00 
13. .40 .60 52 


Here, F(x) > 0.83 while F;,(z) remains at 0.68. Therefore, D,,* = 0.83 — 0.68 = 0.15. It follows that 
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n'/2D,* = 0.75 and, from Table 10.32, H (0.75) = 0.3728. Therefore, the tail area corresponding to the 
observed value of D,,* is 1 — 0.3728 = 0.6272. 


. Since the p.d.f. of the uniform distribution is identically equal to 1, the value of the joint p.d.f. of the 
25 observations under the uniform distribution has the value L, = 1. Also, sixteen of the observations 
are less than 1/2 and nine are greater than 1/2. Therefore, the value of the joint p.d.f. of the observa- 
tions under the other distribution is Lz = (3/2)'®(1/2)° = 1.2829. The posterior probability that the 
observations came from a uniform distribution is 


1 

Ty 
T = 0.438 
—L =f, 
5.1 + 9°72 


. We first replace each observed value x; by the value (2; — 26)/2. Then, under the null hypothesis, 
the transformed values will form a random sample from a standard normal distribution. When these 
transformed values are ordered, we obtain Table $.10.7. The maximum value of | F,(x) — ®(2) | 


Table S.10.7: Table for Exercise 7 in Sec. 10.6. 


i Yi by; Fr(yi) i Yi B(yi)  Frlyi) 
1 —2.2105 .0136 .02 26 —0.010 .4960 52 
2 —1.9265 .0270 04 27 —0.002 4992 54 
3 —1.492 .0675 .06 28 1/40.010 .5040 .56 
4 —1.3295 .0919 .08 29 1/40.1515 5602 58 
5 —1.309 0953 10 30 1/,0.258 .6018 60 
6 —1.2085 .1134 al? 31 1/40.280 6103 .62 
7 —1.1995 .1152 14 32 1/40.3075 .6208 64 
8 —1.125 .1307 16 33 1/40.398 6547 .66 
9 —1.0775 .1417 18 34 1/40.4005 .6556 .68 
10 —1.052 1464 .20 35 1/40.4245 .6645 .70 
11 —0.961 .1682 22 36 1/40.482 6851 .72 
12 —0.8415  .2001 24 37 1/40.614 .7304 74 
13. —0.784 .2165 .26 38 1/40.689 .7546 .76 
14. —0.767 .2215 .28 39 1/40.7165 .7631 .78 
15 —0.678 .2482 .30 40 1/40.7265 .7662 .80 
16 —0.6285 .2648 ey Al 1/40.9262 .8320 .82 
17 —0.548 .2919 34 42 1/41.0645 8564 .84 
18 —0.456 .3242 .36 43 -1/,1.120 .8686 .86 
19 —0.4235  .3359 38 44 1/4 1.176 8802 .88 
20 —0.340 .3669 40 45 1/4 1.239 .8923 90 
21 —0.3245 .3728 42 46 1/4 1.4615 .9281 92 
22 —0.309 3787 44 A7 1/4 1.6315 .9487 94 
23 —0.266 3951 46 48 4 1.7925 .9635 .96 
24 —0.078 .4689 48 49 1/41.889 9705 98 
25. —0.0535 .4787 .50 50 1/42.216 .9866 1.00 


is attained at « = yg and its value is 0.0649. Since n = 50,n!/?2D,* = 0.453. It follows from 
Table 10.32 that H(0.453) = 0.02. Therefore, the tail area corresponding to the observed value of D,,* 
is 1 — 0.02 = 0.98. 


. We first replace each observed value x; by the value (x; — 24)/2. Then, under the null hypothesis, 
the transformed values will form a random sample from a standard normal distribution. Each of the 
transformed values will be one unit larger than the corresponding transformed value in Exercise 7. 
The ordered values are therefore omitted from the tabulation in Table $.10.8. The supremum of 
| F,,(x) — ®(x) | occurs as x — yig from below. Here, ®(2) — 0.7068 while F,,(a) remains at 0.34. 
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Table S.10.8: Table for Exercise 8 in Sec. 10.6. 


i O(yi) Fr(yi) i O(yi) Fr (yi) 
d 1130 02 26 8389 52 
2 1779 04 27 8408 54 
3 3114 .06 28 8437 56 
4 3710 08 29 8752 58 
5 3787 .10 30 8958 60 
6 4174 12 31 8997 62 
7 4209 14 32 9045 64 
8 4502 .16 33 9189 66 
9 4691 18 34 9193 68 
10 4793 .20 39 9229 70 
11 5136 22 36 9309 72 
12 5630 24 37 9467 74 
13 5856 .26 38 9544 76 
14 5921 .28 39 9570 78 
15 6263 .30 AQ) 9579 80 
16 6449 32 41 9751 82 
alyé 6743 34 42 9805 84 
18 7068 36 43 9830 86 
19 7178 38 dd 9852 88 
20 7454 40 45 9875 90 
21 7503 42 46 9931 92 
22 7552 44 AT 9958 94 
23 7685 46 48 9974 96 
24 8217 48 49 9980 98 
25 8280 50 50 9993 1.00 


Therefore, D,,* = 0.7068 — 0.34 = 0.3668. It follows that n!/2D,* = 2.593 and, from Table 10.32, 
H (2.593) = 1.0000. Therefore, the tail area corresponding to the observed value of D,,* is 0.0000. 


. We shall denote the 25 ordered observations in the first sample by 21 < --- < x95 and shall denote 


the 20 ordered observations in the second sample by yy < --- < yao. We obtain Table $.10.9. The 
maximum value of | Fi,(x) — Gn(x) | is attained at x = —0.39, where its value is 0.32 — 0.05 = 0.27. 
Therefore, Dm, = 0.27 and, since m = 25 and n = 20, (mn/[m+ n))/? Dmn = 0.9. From Table 10.32, 
H(0.9) = 0.6073. Hence, the tail area corresponding to the observed value of Dyyy is 1—0.6073 = 0.3927. 


We shall add 2 units to each of the values in the first sample and then carry out the same procedure 
as in Exercise 9. We now obtain Table 8.10.10. The maximum value of | F,,(x) — G,,(x) | is attained 
at x = 1.56, where its value is 0.80 — 0.24 = 0.56. Therefore, Dm», = 0.56 and (mn/|[m + n))/? oe 
1.8667. From Table 10.32, H(1.8667) = 0.998. Therefore, the tail area corresponding to the observed 
value of Dmn is 1 — 0.998 = 0.002. 


We shall multiply each of the observations in the second sample by 3 and then carry out the same proce- 
dure as in Exercise 9. We now obtain Table $.10.11. The maximum value of | F;,(a2)—G (x) | is attained 
at x = 1.06, where its value is 0.80 — 0.30 = 0.50. Therefore, Dn = 0.50 and (mn/|m + n))'/? eg = 
1.667. From Table 10.32, H(1.667) = 0.992. Therefore, the tail area corresponding to the observed 
value of Dm, is 1 — 0.992 = 0.008. 


The maximum difference between the c.d.f. of the normal distribution with mean 3.912 and variance 
0.25 and the empirical c.d.f. of the observed data is D* = 0.2528 which occurs at the observation 4.22 
where the empirical c.d.f. jumps from 11/23 = 0.4783 to 12/23 = 0.5217 and the normal c.d.f. equals 
®([4.22 — 3.912]/0.5) = 0.7311. We now compare (23)!/?D* = 1.2123 to Table 10.32, where we find 
that H(1.2123) = 0.89. The tail area (p-value) is then about 0.11. 
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0.72 


1.18 
1.26 


1.44 


1.60 


Table S.10.9: Table for Exercise 9 in Sec. 10.6. 


yj F(z) 


Section 10.6. 


Kolmogorov-Smirnov Tests 


Table S.10.10: Table for Exercise 10 in Sec. 10.6. 


Yj Fin(z) 
~0.71 0 
04 

—0.37 ‘04 
—0.30 ‘04 
—0.27 ‘04 
0.00 ‘04 
0.26 ‘04 
‘08 

0.36 ‘08 
0.38 ‘08 
0.44 ‘08 
0.52 ‘08 
0.66 ‘08 
0.70 ‘08 
12 

0.96 12 
16 

20 

1.38 ‘20 
"24 

1.50 ‘24 
1.56 ‘24 
28 


Xj 
1.61 


2.20 
2.31 


3.29 
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Table S.10.11: Table for Exercise 11 in Sec. 10.6 


—2.47 04 0 0.78 72 .30 
—2.13 04 05 1.05 # .30 
—1.73 .08 05 1.06 80 .30 
—1.28 12 .05 1.08 80 39 
—1.11 12 10 1.09 84 30 
—0.90 12 15 1.14 84 40 
—0.82 16 15 1.31 88 40 
—0.81 16 .20 1.32 88 A5 
—0.74 .20 .20 1.56 88 .00 
—0.56 24 20 1.64 92 00 
—0.40 .28 20 «1.77 96 .00 
—0.39 32 20 1.98 96 .00 
—0.32 .36 .20 2.10 96 .60 
—0.06 40 20 2.36 1.00 .60 
0.00 40 .25 2.88 1.00 65 
0.05 44 25 4.14 1.00 70 
0.06 A8 25 4.50 1.00 75 
0.29 Raye 25 4.68 1.00 80 
0.31 .06 25 4.98 1.00 85 
0.51 .60 25 6.60 1.00 90 
0.59 64 25 6.93 1.00 95 
0.61 .68 25 9.87 1.00 1.00 
0.64 .72 25 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


Section 10.7. Robust Estimation 333 


10.7 Robust Estimation 


Commentary 


In recent years, interest has grown in the use of robust statistical methods. Although many robust methods 
are more suitable for advanced courses, this section introduces some robust methods that can be understood 
at the level of the rest of this text. This includes /-estimators of a location parameter. 

The software R contains some functions that can be used for robust estimation. The function quantile 
computes sample quantiles. The first argument is a vector of observed values. The second argument is a vector 
of probabilities for the desired quantiles. For example quantile(x,c(0.25,0.75)) computes the sample 
quartiles of the data x. The function median computes the sample median. The function mad computes the 
median absolute deviation of a sample. If you issue the command library (MASS), some additional functions 
become available. One such function is huber, which computes M-estimators as on page 673 with @ equal 
to the median absolute deviation. The first argument is the vector of data values, and the second argument 
is k, in the notation of the text. To find the M-estimator with a general o, replace the second argument by 
ko divided by the mean absolute deviation of the data. 


Solutions to Exercises. 


1. The observed values ordered from smallest to largest are 2.1, 2.2, 21.3, 21.5, 21.7, 21.7, 21.8, 22.1, 22.1, 
22.2, 22.4, 22.5, 22.9, 23.0, 63.0. 


(a) The sample mean is the average of the numbers, 22.17. 


(b) The trimmed mean for a given value of k is found by dropping k values from each end of this 
ordered sequence and averaging the remaining values. In this problem we get 


k 1 2 3. =«4 
kth level trimmed mean | 20.57 22.02 22 22 
(c) The sample median is the middle observation, 22.1. 


(d) The median absolute deviation is 0.4. Suppose that we start iterating with the sample average 
22.17. The 7th and 8th iterations are both 22. 


2. The observed values ordered from smallest to largest are —2.40,—2.00, —0.11, 0.00, 0.03, 0.10, 0.12, 
0.23, 0.24, 0.24, 0.36, 0.69, 1.24, 1.78. 


(a) The sample mean is the average of these values, 0.0371. 


(b) The trimmed mean for a given value of & is found by dropping k values from each end of this 
ordered sequence and averaging the remaining values. In this problem we get 


k 1 2 3 4 
kth level trimmed mean | 0.095 0.19 0.165 0.16 


(c) Since the number of observed values is even, the sample median is the average of the two middle 
values 0.12 and 0.23, which equals 0.175. 


(d) The median absolute deviation is 0.18. Suppose that we start iterating with the sample average 
0.0371. The 9th and 10th iterations are both 0.165. 


3. The distribution of 65, will be approximately normal with mean 6 and standard deviation 1/[2n1/? f (0)]. 
In this exercise, 
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Hence, f(#) = 1/V27. Since n = 100, the standard deviation of the approximate distribution of O5n 
is ¥27/20 = 0.1253. It follows that the distribution of Z = (0.5, — @)/0.1253 will be approximately 
standard normal. Thus, 


~ 0.1 
Pr(|6.5.n — 8| < 0.1) = Pr (iz! < — = Pr(|Z| < 0.798) = 26(0.798) — 1 = 0.575. 
. Here, 
1 
f(a) = m1 + (2 —0)2] 


Therefore, f(@) = 1/m and, since n = 100, it follows that the distribution of O5.n will be approximately 
normal with mean @ and standard deviation 7/20 = 0.1571. Thus, the distribution of Z = (65 — 
6) /0.1571 will be approximately standard normal. Hence, 


0.1 
0.1571 


Pr(|4.5,n —0| < 0.1) =Pr (iz! < ) = Pr(|Z| < 0.637) = 26(0.637) — 1 = 0.476. 


. Let the first density on the right side of Eq. (10.7.1) be called h. Since both h and g are symmetric 


with respect to s1, so also is f(x). Therefore, both the sample mean X,, and the sample median X,, are 
unbiased estimators of yu. It follows that the M.S.E. of X,, is equal to Var(X,,) and that the M.S.E. of 


X,, is equal to Var(X,,). The variance of a single observation X is 


var(x) = f° =p)? Fle)dx 


= 5 fe -wPh(ajde + 5 [we n)Pola)ar 
1 1 5 
,@+5@=3. 


Since n = 100, Var(X») = (1/100)(5/2) = 0.025. 
The variance of X;, will be approximately 1/[4nh?()]. Since 


1 1 1 1 
hz) = eeexp |-5(2- 1)*| and g(a) = exw [- te — w], 
it follows that 
Fu) = Shu) + 590) = 5 et Se = 


Therefore, Var(X,,) is approximately 27/225 = 0.028. 


. Let gn(a) be the joint p.d.f. of the data given that they came from the uniform distribution, and 


let f(a) be the joint p.d.f. given that they come from the p.d.f. in Exercise 5. According to Bayes’ 
theorem, the posterior probability that they came from the uniform distribution is 


gIn(@) 


1 1 . 
It is easy to see that g,(x) = 1 for these data, while f,(a) = (3/2)!°(1/2)9 = 1.283. This makes the 
posterior probability of the uniform distribution 1/2.283 = 0.4380. 
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(a) The mean X,, is the mean of each X;. Since f(x) is a weighted average of two other p.d.f.’s, 
the { «f(x)dzx is the same mixture of the means of the other two distributions. Since each of the 
distributions in the mixture has mean jz, so does the distribution with p.d.f. f. 


(b) The variance Xp, is 1/n times the variance of X;. The variance of X; is E(X?) — y?. Since the 
p.d.f. of X; is a weighted average of two other p.d.f.’s, the mean of X? is the same weighted average 
of the two means of X? from the two p.d-f.’s. The mean of X? from the first p.d.f. (the normal 
distribution with mean p and variance o”) is w? +07. The mean of X? from the second p.d-f. (the 
normal distribution with mean pz and variance 10007) is u? + 10007. The weighted average is 


(=e? +e*) + a? +1000") =p? + 67 (1+ 906): 
The variance of X; is then (1 + 99e)o7, and the variance X,, is (1 + 99e)o?/n. 


When ¢€ = 1, the distribution whose p.d.f. is in Eq. (10.7.2) is the normal distribution with mean ju 
and variance 10007. When ¢ = 0, the distribution is the normal distribution with mean jy and variance 
a”. The ratio of the variances of the sample mean and sample median from a normal distribution 
does not depend on the variance of the normal distribution, hence the ratio will be the same whether 
the variance is 0? or 1000”. The reason that the ratio doesn’t depend on the variance of the specific 
normal distribution is that both the sample mean and the sample median have variances that equal 


the variance of the original distribution times constants that depend only on the sample size. 


. The likelihood function is 


1 t= 
=I 


It is easy to see that, no matter what o equals, the M.L.E. of @ is the number that minimizes > |x;—9]. 


i=1 
n 


This is the same as the number that minimizes Zs |x; — O|/n. The value S- |x; — O|/n is the mean of 


i=l i=1 
|X — 6| when the c.d.f. of X is the sample c.d.f. of X1,...,X,. The mean of |X — 0| is minimized by 
@ equal to a median of the distribution of X according to Theorem 4.5.3. The median of the sample 
distribution is the sample median. 


The likelihood was given in Exercise 9. The logarithm of the likelihood equals 


n 


1 
—n log(20) — aos |x; — |. 


i=1 


For convenience, assume that 71 < ©g <... < %p. Let @ be a given number between two consecutive 
x; values. In particular, let x, < @ < x41. For known o, the likelihood can be written as a constant 
plus a constant times 


n k 
> x; —(n—k)0 Lae + k@. 
i=k+1 i=1 


For 6 between x; and x,41, the derivative of this is k — (n — k), the difference between the number of 
observations below 6 and the number above @. 
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Let xq be the q quantile of X. The result will follow if we can prove that the q quantile of aX + b is 
axt,g + b. Since 


Pr(aX +b < atg +b) = Pr(X < zy), 


for all a > 0 and b and q, it follows that ax, + b is the q quantile of aX + b. 


According to the solution to Exercise 11, the median of aX +b is am +, where m is the median of X. 
The median absolute deviation of X is the median of |X — m|, which equals o. The median absolute 
deviation of aX + 6 is the median of |aX + b— (am + b)| = a|X — m|. According to the solution to 
Exercise 11, the median of a|X — m| is a times the median of |X — ml, that is, ac. 


The Cauchy distribution is symmetric around 0, so the median is 0, and the median absolute deviation 
is the median of Y = |X|. If F is the c.d.f. of X, then the c.d.f. of Y is 


G(y) = Pr(Y < y) = Pr(|X| < y) = Pr(-y < X < y) = Fly) — F(-y), 


because X has a continuous distribution. Because X has a symmetric distribution around 0, F'(—y) = 
1— F(y), and G(y) = 2F(y) — 1. The median of Y is where G(y) = 0.5, that is 2F(y) — 1 = 0.5 or 
F(y) = 0.75. So, the median of Y is the 0.75 quantile of X, namely y = 1. 


(a) The c.d.f. of X is F(x) = 1—exp(—2), so the quantile function is F~'(p) = —log(1—p)/X. The 
IQR is 


_log(0.25) rn log(0.75) _ log(3) 
nN A a oe 
(b) The median of X is log(2)/X, and the median absolute deviation is the median of |X —log(2)/A|. It 
is the value x such that Pr(log(2)/A—a < X < log(2)/A+az) = 0.5. If we try letting x = log(3)/[2)] 
(half of the IQR), then 


Pr(log(2)/\ — @ < X <log(2)/A+2) = [1 —exp(—log(2V3))] — [1 — exp(—log(2/V3))] 
= 5v3 — 1/3] = 0.5773. 


F095) = F 40.05) = 


This is greater than 0.5, so the median absolute deviation is smaller than 1/2 of the IQR. 


(a) The quantile function of the normal distribution with mean py and variance o? is the inverse of the 
c.d.f., F(x) = ®([x — p]/o). So, 
F\(p) = +o87'(p). (S.10.1) 
The IQR is 
F-1(0.75) — F-+(0.25) = o[@-1(0.75) — ®-1(0.25)). 


Since the standard normal distribution is symmetric around 0, ®~!(0.25) = —®~1(0.75), so the 
IQR is 206~1(0.75). 

(b) Let F' be the c.d-f. of a distribution that is symmetric around its median yw. The median absolute 
deviation is then the value x such that F(u +2) — F(u— x) = 0.5. By symmetry around the 
median, we know that F(u—2x) = 1—F(u+2), so x solves 2F(uw+x)—1=0.5 or F(w+z) = 0.75. 


That is, c = F~'(0.75) — ys. For the case of normal random variables, use Eq. (S.10.1) to conclude 
that x = 0@~!(0.75). 
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16. Here are the sorted values from smallest to largest: 
—67, —48, 6, 8, 14, 16, 23, 24, 28, 29, 41, 49, 56, 60, 75. 


(a) The average is 20.93. 
(b) The trimmed means are 


k 1 2 3.«O«A 
Trimmed mean | 25.54 26.73 25.78 25 


(c) The sample median is 24. 


(d) The median absolute deviation divided by 0.6745 is 25.20385. Starting at 0 = 0 and iterating the 
procedure described on page 673 of the text, we get the following sequence of values for 0: 


20.805, 24.017, 26.278, 24.342, 24.373, 24.376, 24.377, 24.377,... 
After 9 iterations, the value stays the same to 7 significant digits. 
17. Let y stand for the median of the distribution, and let w+ c¢ be the 0.75 quantile. By symmetry, the 
0.25 quantile is 4—c. Also, f(uw+c) = f(u—c). The large sample joint distribution of the 0.25 and 
0.75 sample quantiles is a bivariate normal distribution with means yp —c and +c, variances both 


equal to 3/[16n f(y + c)?], and covariance 1/[16nf (ju + c)?]. The IQR is the difference between these 
two sample quantiles, so its large sample distribution is normal with mean 2c and variance 


3 3 il 1 
ieafctee ntuse? lnk te? ater 


10.8 Sign and Rank Tests 


Commentary 


This section ends with a derivation of the power function of the Wilcoxon-Mann-Whitney ranks test. This 
derivation is a bit more technical than the rest of the section and is perhaps suitable only for the more 
mathematically inclined reader. 

If one is using the software R, the function wilcox.test performs the Wilcoxon-Mann-Whitney ranks 
test. The two arguments are the two samples whose distributions are being compared. 


Solutions to Exercises. 


1. Let W be the number of (X;,Y;) pairs with X; < Y;. Then W has a binomial distribution with 
parameters n and p. To test Ho, we reject Ho if W is too large. In particular, if c is chosen so that 


= n 1.” an 1\” 
© (C)G) <sd (2) (@)" 
w=ctl bad W=C “ 

then we can reject Ho if W > c for a level apo test. 


2. The largest difference between the two sample c.d.f.’s occurs between 2.336 and 2.431 and equals 
|0.8 — 0.125| = 0.675. The test statistic is then 


(—— 
8+ 10 


The tail area is between 0.0397 and 0.0298. 


1/2 
) 0.675 = 1.423. 
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3. This test was performed in Example 9.6.5, and the tail area is 0.003. 


4. By ordering all the observations, we obtain Table $.10.12. The sum of the ranks of 21,...,x95 is 


Table S.10.12: Table for Exercise 4 of Sec. 10.8. 


Observed Observed 
Rank value Sample | Rank value Sample 
1 0.04 x 21 1.01 y 
2 0.13 x 22 1.07 y 
3 0.16 x 23 Ld x 
4 0.28 x 24 1.15 x 
5 0.35 x 25 1.20 x 
6 0.39 zt 26 1.25 Yy 
fA 0.40 x 7 1.26 y 
8 0.44 x 28 1.31 y 
9 0.49 x 29 1.38 x 
10 0.58 x 30 1.48 y 
11 0.68 y 31 1.50 x 
12 0.71 L 32 1.54 x 
13 0.72 x 33 1.59 y 
14 0.75 x 34 1.63 y 
15 0.77 x 35 1.64 £ 
16 0.83 x 36 1.73 a 
17 0.86 y 37 1.78 y 
18 0.89 y 38 1.81 y 
19 0.90 x 39 1.82 y 
20 0.91 x 40 1.95 y 


S = 399. Since m = 25 and n= 15, it follows from Eqs. (10.8.3) and (10.8.4) that E(S) = 512.5, 
Var(S) = 1281.25, and o = (1281.25)!/2 = 35.7946. Hence, Z = (399 — 512.5)/35.7946 = —3.17. It can 
be found from a table of the standard normal distribution that the corresponding two-sided tail area is 
0.0015. 


. Since there are 25 observations in the first sample, F;,,(a) will jump by the amount 0.04 at each observed 


value. Since there are 15 observations in the second sample, G,,(a) will jump by the amount 0.0667 at 
each observed value. From the table given in the solution to Exercise 4, we obtain Table $.10.13. It 
can be seen from this table that the maximum value of |F;,(x) — G,(x)| occurs when z is equal to the 
observed value of rank 16, and its value at this point is .60 — .0667 = .5333. Hence, Dm», = 0.5333 and 
( mn \1/? Bay 


mn 


m+n ~ \ 40 
tail area is almost exactly 0.01. 


(0.5333) = 1.633. It is found from Table 10.32 that the corresponding 


. It is found from the values given in Tables 10.44 and 10.45 that F = 37?2, a;/25 = 0.8044, 7 = 


EE, yi/15 = 1.3593, $2 = 725, (ae; — F)? = 5.8810, and $2 = y7}5,(y; — 9)? = 2.2447. Since m = 25 
and n = 15, it follows from Eq. (9.6.3) that U = —3.674. It can be found from a table of the t 
distribution with m+n — 2 = 38 degrees of freedom that the corresponding two-sided tail area is less 
than 0.01. 


. We need to show that F(0 + G~!(p)) = p. Compute 


6+G~1(p) 0+G~1(p) 


F(0+G7(p) = [ 


—oo 


fade = | 


=o 


ged =f” gly)dy = G(G(p)) =. 


—oo 


where the third equality follows by making the change of variables y = x — 0. 
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Table S.10.13: Table for Exercise 5 of Sec. 10.8. 


Rank of Ra f 
observations Fj,(%) G,(x) | observations Fiy,(7) Gp(x) 
al 04 0 21 .68 .2667 
2 08 0 22 .68 £3339 
3 mb 0 23 72 £0339 
4 16 0 24 76 £0339 
5 20 0 25 .80 13339 
6 24 0 26 .80 .4000 
7 28 0 27 .80 .4667 
8 .o2 0 28 .80 .0333 
9 36 0 29 84 5333 
10 40 0 30 84 6000 
11 40 .0667 3l 88 6000 
12 44 .O667 32 .92 .6000 
13 48 .0667 33 .92 .6667 
14 52 0667 34 92 7333 
15 56 0667 35 96 7333 
16 60 0667 36 1.00 7333 
17 60 1333 37 1.00 8000 
18 60 2000 38 1.00 8667 
IS 64 2000 39 1.00 9333 
20 68 2000 40 1.00 1.0000 


. Since Y +0 and X have the same distribution, it follows that if @ > 0 then the values in the first sample 


will tend to be larger than the values in the second sample. In other words, when 6 > 0, the sum S of 
the ranks in the first sample will tend to be larger than it would be if 6 = 0 or 0 < 0. Therefore, we 
will reject Ho if Z > c, where Z is as defined in this section and c is an appropriate constant. If we 
want the test to have a specified level of significance ap (0 < ag < 1), then c should be chosen so that 
when Z has a standard normal distribution, Pr(Z > c) = ao. It should be kept in mind that the level 
of significance of this test will only be approximately ap because for finite sample sizes, the distribution 
of Z will only be approximately a standard normal distribution when 6 = 0. 


. To test these hypotheses, add 69 to each observed value y; in the second sample and then carry out the 


Wilcoxon-Mann-Whitney procedure on the original values in the first sample and the new values in the 
second sample. 


For each value of 69, carry out a test of the hypotheses given in Exercise 7 at the level of significance 
1—a. The confidence interval for 9 will contain all values of #9 for which the null hypothesis Hp would 
be accepted. 


Let ry < rg < +++ < rm denote the ranks of the observed values in the first sample, and let X;, < Xi, < 
--+ < X;,, denote the corresponding observed values. Then there are r; — 1 values of Y in the second 
sample that are smaller than X;,. Hence, there are r; — 1 pairs (X;,, Yj) with X;, > Yj. Similarly, 
there are rg — 2 values of Y in the second sample that are smaller than X;,. Hence, there are rg — 2 
pairs (X;,,Y;) with X;, > Y;. By continuing in this way, we see that the number U is equal to 


(1-1) + (2-2) $+ rm —m) = Dore d= 8 — pm + 0). 


i=1 i=1 


Using the result in Exercise 11, we find that E(S) = E(U)+m(m-+1)/2, where U is defined in Exercise 11 
to be the number of (X;, Y;) pairs for which X; > Y;. So, we need to show that E(U) = nm Pr(X, > Y}). 
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We can let Z;; =1 if X; > Y; and Z;,; = 0 otherwise. Then 
Cay 4s (S.10.2) 
a) 


and 
EWU) => E (Zig). 
i=1 j=1 
Since all of the X; are i.i.d. and all of the Y; are i.i.d., it follows that E(Z;,;) = E(21,1) for all i and j. 


Of course E(Z11) = Pr(Xy > Yai, sO E(U) = mn Pr( X41 > Yi). 


Since S and U differ by a constant, we need to show that Var(U) is given by Eq. (10.8.6). Once again, 
write 


U=) Dd Ais: 
tj 
where Z;; = 1 if X; > Y; and Z;,; = 0 otherwise. Hence, 


Var(U) = 5° $5 Var(Z;,5) + Cov(Z;,;, Zi 3"). 
tog @IDAGI) 


The first sum is mn[Pr(X1 > Y,) — Pr(X1 > Y1)?]. The second sum can be broken into three parts: 


e The terms with 7’ =7 but 7’ 47. 
e The terms with 7’ = 7 but 7’ 47. 
e The terms with both i’ i and j’ 4 j. 


For the last set of terms Cov(Zj,;, Zr”) = 0 since (Xj, Y;) is independent of (Xj, Yj). For each term 
in the first set 


EA; (Ay gt) = Pr Xy 2 V4, Xs S Yo), 
so the covariances are 
Cov(Zi,j, Zi5’) = Pr(X1 > Vi, X1 > Yo) — Pr(X1 > V1)’. 
There are mn(n — 1) terms of this sort. Similarly, for the second set of terms 
Cov(Zy 5, Zij) = Pr(X > Vi, X2 > V1) — Pr(X1 > V1)’. 
There are nm(m — 1) of these terms. The variance is then 
nm [Pr(X1 > Yi) + (n— 1) Pr( > V1, X1 > Yo) 
+(m—1)Pr(X1 > %i,X2> Vi) —(m+n—1)Pr(X > V4)? ]. 
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14. When F' = G, Pr(X1 > Yi) = 1/2, so Eq. (10.8.5) yields 


mn m(m+i1 mim+n+1 
ae 


which is the same as (10.8.3). When F' =G, 
Praga 2 Yi, An 2 Yo) = 1/38 = Pri S 1, Ao = Ni), 


so the corrected version of (10.8.6) yields 


1 1 1 
nm 5 ern Iot(m+n 2)3 = [6 — 3m — 3n + 3+ 4m + 4n — 8} 
— mn(m+n-+1) 
7 12 


which is the same as (10.8.4). 
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15. (a) Arrange the observations so that |D,| < --- < |D,|. Then D; > 0 if and only if X; > Y; if and 
n 


only if W; = 1. Since rank 7 gets added into Sy if and only if D; > 0, we see that S iW; adds 


i=1 
just those ranks that correspond to positive D;. 


(b) Since the distribution of each D; is symmetric around 0, the magnitude |Dj|,...,|D,| are inde- 
pendent of the sign indicators W1,...,W,. Using the result of part (a), if we assume that the |D;| 
nm 


are ordered from smallest to largest, E (Sy) = > iE(W;). Since the |D,| are independent of the 


i=l 


W;, we have E(W;) = 1/2 even after we condition on the |D;| being arranged from smallest to 


largest. Since wi =n(n+1)/2, we have E(Sw) = n(n + 1)/4. 
i=1 


(c) Since the W; are independent before we condition on the |D,;| and they are independent of the |Dj|, 
n 


then the W; are independent conditional on the |D;|. Hence, Var(Sw) = Se Var(W;). Since 
i=1 


Var(W;) = 1/4 for all 7 and Sor = n(n+1)(2n + 1)/6, we have Var(Sw) = n(n + 1)(2n + 1)/24. 


i=1 


16. For i =1,...,15, let 


D; = (thickness for material A in pair i) — (thickness for material B in pair i). 


(a) Of the 15 values of D;, 10 are positive, 3 are negative, and 2 are zero. If we first regard the 
zeroes as positive, then there are 12 positive differences with n = 15, and it is found from the 
binomial tables that the corresponding tail area is 0.0176. If we next regard the zeroes as negative, 
then there are only 10 positive differences with n = 15, and it is found from the tables that the 
corresponding tail area is 0.1509. The results are not conclusive because of the zeroes present in 


the sample. 


(b) For the Wilcoxon signed-ranks test, use Table S.10.14. Two different methods have been used. In 
Method (I), the differences that are equal to 0 are regarded as positive, and whenever two or more 
values of | D; | are tied, the positive differences D; are assigned the largest possible ranks and the 
negative differences D; are assigned the smallest ranks. In Method (II), the differences that are 0 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


342 Chapter 10. Categorical Data and Nonparametric Methods 


Table $.10.14: Computation of Wilcoxon signed-ranks test statistic for Exercise 16b in Sec. 10.8. 


Method (I) Method (I) Method (II) Method (II) 
Pair D; Rank of |D;| Signed rank Rank of |D;| Signed rank 


1 —0.8 6 —6 7 —7 
2 1.6 12 12 11 11 
3 —0.5 5 —5 5 a) 
4 0.2 3 3 3 3 
5 —1.6 11 —l1 13 —13 
6 0.2 4 4 4 4 
7 1.6 13 13 12 12 
8 1.0 9 9 9 9 
9 0.8 7 7 6 6 
10 0.9 8 8 8 8 
11 1.7 14 14 14 14 
12 1.2 10 10 10 10 
13 1.9 15 15 15 15 
14 0) 2 2 2 —2 
15 0 1 1 1 —1 


are regarded as negative, and among tied values of | D; |, the negative differences are assigned the 
largest ranks and the positive differences are assigned the smallest ranks. Let S;, denote the sum 
of the positive ranks. Since n = 15, E(S;,) = 60 and Var (S,,) = 310. Hence, a, = W310 = 17.607. 
For Method (I), S, = 98. Therefore, Z,, = (98 — 60)/17.607 = 2.158 and it is found from a table 
of the standard normal distribution that the corresponding tail area is 0.0155. For Method (II), 
S, = 92. Therefore, Z, = 1.817 and it is found that the corresponding tail area is 0.0346. By 
either method of analysis, the null hypothesis would be rejected at the 0.05 level of significance, 
but not at the 0.01 level. 


(c) The average of the pairwise differences (material A minus material B) is 0.5467. The value of o’ 
computed from the differences is 1.0197, so the t statistic is 2.076, and the p-value is 0.0284. 


10.9 Supplementary Exercises 


Solutions to Exercises 


1. Here, ap/2 = 0.025. From a table of binomial probabilities we find that 


5 6 
20 20 

x ( )o.s = 0021 < 0.025 < 5 ( )o.s = 0.058. 
x x 


So, the sign test would reject the null hypothesis that 6 = 4 if the number W of observations with values 
at most 09 satisfies either W < 5 or W > 20 —5. Equivalently, we would accept the null hypothesis 
if6 < W < 14. This, in turn, is true if and only if 09 is strictly between the sixth and fourteenth 
ordered values of the original data. These values are 141 and 175, so our 95 percent confidence interval 
is (141, 175). 
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2. It follows from Eq. (10.1.2) that 


Q=>> s Wy" =2 an oN — 2(80)(400) + 5(80)? aN 400. 


i=1 


It is found from the «3 distribution with 4 degrees of freedom that Ho should be rejected for Q > 13.28 
or, equivalently, for >?_, N? > 80(413.28) = 33, 062.4. 


3. Under Ho, the proportion p? of families with i boys is as follows: 


i py np 
0 1/8 16 
1 3/8 48 
2 3/8 48 
3 1/8 16 


Hence, it follows from Eq. (10.1.2) that 


(26—16)? (32-48)? (40-48)? (30-16)? 
= of a 5 55 = 5.1667. 
. i | 4 | 4° 6 


Under Ho, Q has the y? distribution with 3 degrees of freedom. Hence, the tail area corresponding to 
Q = 25.1667 is less than 0.005, the smallest probability in the table in the back of the book. It follows 
that Hp should be rejected for any level of significance greater than this tail area. 


4. The likelihood function of p based on the observed data is 


(q°)7° (8nq7)*? (3p?) (p?)™ = (const.) pq", 


where g=1-—p. Hence, the M.L.E. of 6 based on these data is p = 202/384 = .526. Under Ho, the 
estimated expected proportion #9 of families with i boys is as follows: 


i pe npp 
0 3 — 1065 13.632 


q 

30g? = .3545 45.376 
2g = 3935 50.368 

3 = 1455 = 18.624 


It follows from Eq. (10.2.4) that 


(26 — 13.632)? (32 — 45.376), (40 — 50.368)? (30 — 18.624)? 


a te = 24.247. 
13.632 45.376 oF 50.368 sa 18.624 a 


Q= 


Under Ho, Q has the x? distribution with 2 degrees of freedom. The tail area corresponding to 
Q = 24.247 is again less than 0.005. Ho should be rejected for any level of significance greater than 
this tail area. 


5. The expected numbers of observations in each cell, as specified by Eq. (10.4.4), are presented in the 
following table: 
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A B AB O 


Group 1 
Group 2 
Group 3 


It is now found from Eq. (10.4.3) that Q = 6.9526. Under the hypothesis Ho that the distribution is 
the same in all three groups, Q will have approximately the ? distribution with 3 x 2 = 6 degrees of 
freedom. It is found from the tables that the 0.9 quantile of that distribution is 10.64, so Hp should 
not. be rejected. 


. If Table 10.47 is changed in such a way that the row and column totals remain unchanged, then the 


expected numbers given in the solution of Exercise 5 will remain unchanged. If we switch one person 
in group 1 from B to AB and one person in group 2 from AB to B, then all row and column totals 
will be unchanged and the new observed numbers in each of the four affected cells will be further from 
their expected values than before. Hence, the value of the x? statistic Q as given by Eq. (10.4.3) is 
increased. Continuing to switch persons in this way will continue to increase Q. There are other similar 
switches that will also increase Q, such as switching one person in group 2 from O to A and one person 
in group 3 from A to O. 


‘i MaNia\" 
(Nu — E11)? = (vn — ws) 
_ [Mn _ (Nii + M2)(Nii + ue) 
n 


1 
72 [Di — (Ni + Ni2)(Nir + No1)]? 


1 


72 (Nii No2 — Ni2Na1)?, 


since n = Ny + No + Noi + Nop. Exactly the same value is obtained for (Ni2 — Ey2)?, (Nor — En), 
and (No2 —= Ep9)?. 


. It follows from Eq. (10.3.4) and Exercise 7 that 


2 2 
1 1 
Q = (NiN2 — M2No1)? S05. 
uv algal 
But 
2 2 
ee 
1 jan Ey NitNi1  Ni4Ny2  NoyNy1 = NotNyj2 
_ No4Ny2 + No+Ni1 + Ni4 Noo + M14N41) 
Ni4+No4N41N42 
Ni4+No4N41N42’ 


since Ny, + No, = Ni, + Nig =n. Hence, Q has the specified form. 
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. In this exercise, Nia = Nox = Nai = Nye = 2n and Ny Noo — Nig No, = (n+)? — (n — a)? = 4na. 


It now follows from Exercise 8 (after we replace n by 4n in the expression for Q) that Q = 4a?/n. 
Since Hp should be rejected if Q > 6.635, it follows that Ho should be rejected if a > (6.635n)!/?/2 or 
a < —(6.635n)!/2 /2. 


In this exercise Ny4 = Noy = Ni, = Nyo =n and Ny No2— Ni2No1 = (2a—1)n?. Tt now follows from 
Exercise 8 (after we replace n by 2n in the expression for Q) that Q = 2n(2a — 1)?. Since Hp should 
be rejected if Q > 3.841, it follows that Ho should be rejected if either 


a it ies (su 
2 2n 
be Bie. (su 
2 Qn : 


Results of this type are an example of Simpson’s paradox. If there is a higher rate of respiratory diseases 


among older people than among younger people, and if city A has a higher proportion of older people 
than city B, then results of this type can very well occur. 


or 


Results of this type are another example of Simpson’s paradox. If scores on the test tend to be higher 
for certain classes, such as seniors and juniors, and lower for the other classes, such as freshmen and 
sophomores, and if school B has a higher proportion of seniors and juniors than school A, then results 
of this type can very well occur. 


The fundamental aspect of this exercise is that it is not possible to assess the effectiveness of the 
treatment without having any information about how the levels of depression of the patients would 
have changed over the three-month period if they had not received the treatment. In other words, 
without the presence of a control group of similar patients who received some other standard treatment 
or no treatment at all, there is little meaningful statistical analysis that can be carried out. We can 
compare the proportion of patients at various levels who showed improvement after the treatment with 
the proportion who remained the same or worsened, but without a control group we have no way of 
deciding whether these proportions are unusually large or small. 


If ¥, < Yo < Y3 are the order statistics of the sample, then Y2 is the sample median. For 0 < y < 1, 


Gy)  =Pr (¥2<y) 
= Pr(At least two obs. < y) 
= Pr(Exactly two obs. < y) + Pr(All three obs. < y) 
= 3(y")?(1 — y’) + (y)? 
= 3y29 — 2y39, 
Hence, for 0 < y < 1. the p.d.f. of Yo is g(y) = G’(y) = 60(y29-! — y?8-1), 


The c.d.f. of this distribution is F(x) = 2°, so the median of the distribution is the point m such 
that m? = 1/2. Thus, m = (1/2)? and f(m) = 021/°/2. It follows from Theorem 10.7.1 that the 
asymptotic distribution of the sample median will be normal with mean m and variance 


1 4 
Anf2(m) — n6222/0° 
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16. We know from Exercise 1 of Sec. 8.4 that the variance of the ¢ distribution is finite only for @ > 2 and 
its value is a/(a@ — 2). Hence, it follows from the central limit theorem that for a > 2, the asymptotic 
distribution of X,, will be normal with mean 0 and variance 


ee 
1 n(a— 2)" 


Since the median of the ¢ distribution is 0, it follows from Theorem 10.7.1 (with n replaced by a) that 
the asymptotic distribution of X,, will be normal with mean 0 and variance 


ar? (S) 
oi = 


7 AnT2 (Se) 
2 


Thus, of < 03 if and only if 


GQ) CH) 


> 


3 5Vvm 1 n/16 
a. I 
4 1 st 8/9 
3 
> 5° anal 2 275° (16)? = 1,04, 


Thus, o? ei for a = 5,6,7,... 


17. As shown in Exercise 5 of Sec. 10.7, E(Xn) 
equal to its variance. Furthermore, Var(X ,) 


E(Xn) = 9, so the M.S.E. of each of these estimators is 
+[a-1+(1—a)o?] and 


1 


Yorn) = SOT 


where 


noir = + (a+4+=8). 


(a) For o? = 100, Var(X,) < Var(Xp) if and only if 
507 
——__——_—_—_— 100(1 — a). 
maraaae ore 2 
Some numerical calculations show that this inequality is satisfied for .031 <a < .994. 


(b) For a= 4, Var(X;,) < Var(Xn) if and only if o < .447 or o > 1/.447 = 2.237. 
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The simplest and most intuitive way to establish this result is to note that for any fixed values 
YI <Y2< 06° << Yn, 


AY1s--->Yn)Ay +++ Aun © 
Pr(iyy < Yi < yr + Ayi,---5Yn < Yn < Yn t+ Ayn) = 
Pr es one observation in the interval (y;,yj; + Ay,;) for j =1,...,n) = 


n} ITF (yj + Ay;) — F(yy)] © 


me (y)Ay] =n! f(y.) --> FynJAy «++ Adn, 


where the factor n! appears because there are n! different arrangements of X1,...,X,, such that exactly 
one of them is in each of the intervals (y;,y; + Ayj), 7 =1,...,n. Another, and somewhat more 
complicated, way to establish this result is to determine the general form of the joint c.d.f. G(y1,.-., Yn) 
of Y1,..., Yn for y1 < y2 <+++ < Yn, and then to note that 


O"G(y1, agar Ui) 


Baas Oi, =n! f(yi) «++ Fyn). 


g(Y1,- . Yn) = 
It follows from Exercise 18 that the joint p.d-f. g(y1, y2, y3) = 3!, a constant, for 0 < y, < y2 < y3 <1. 
Since the required conditional p.d.f. of Yj is proportional to g(y1, y2, y3), as a function of ye for fixed yj; 
and ys, it follows that this conditional p.d.f. is also constant. In other words, the required conditional 
distribution is uniform on the interval (yj, y3). 


We have Y, < 6 < Y;+3 if and only if at least r observations and at most r+ 2 observation are below 6 
Let X stand for the number of observations out of the sample of size 20 that are below 6. Then X has 
a binomial distribution with parameters 20 and 0.3. It follows that 


Pry <0 < ¥,449) = Pree X < +2), 


For each value of r, we can find this probability using a binomial distribution table or a computer. By 
searching through all values of r, we find that r = 5 yields a probability of 0.5348, which is the highest. 


As shown in Exercise 10 of Sec. 10.8, we add 09 to each observation Y; and then carry out the Wilcoxon- 
Mann-Whitney test on the sum S¢, of the ranks of the X;’s among these new values Y; + 60,..., Yn + 90. 
We accept Ho if and only if 


| So, — E(S) | a 
[Var(sy2 ~ ° ( 7 5) 


where E(S) and Var(S) are given by (10.8.3) and (10.8.4). However, by Exercise 11 of Sec. 10.8, 


Sati +1). 


56 = Us, + 5 


When we make this substitution for Sg, in the above inequality, we obtain the desired result. 


We know from general principles that the set of all values 9) for which Hp would be accepted in 
Exercise 21 will from a confidence interval with the required confidence coefficient 1—a. But if Ug,, the 
number of differences X; — Y; that are greater than 9p, is greater than the lower limit given in Exercise 21 
then #9 must be less than B. Similarly, if Ug, is less than the upper limit given in Exercise 22, then 69 
must be greater than A. Hence, A < 6 < B is a confidence interval. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


348 


23. 


Chapter 10. Categorical Data and Nonparametric Methods 


(a) We know that 6, = 6 if and only if Pr(X < b) = p. So, let Y; = 1 if X; < b and Y; = 0 if not. 


YS 


Then Yj,...,Y, are iid. with a Bernoulli distribution with parameter p if and only if Ho is true. 
Define W = >*_, Y; To test Ho, reject Ho if W is too big or too small. For an equal tailed level 
ag test, choose two numbers c, < cg such that 


See Qo att (ay 
5 ( Jona —py""” <=< JS ( Jona =r 
Ww 2 Ww 
w=0 w=0 
ae eae ao n n 
1 _ M—W < < W 1 a A 

py (")e (L—@) So. 2 wie Aa?) 

=c2 w=c2—-1 


Then a level ap test rejects Ho if W < cy or W > co. 


For each b, we have shown how to construct a test of Ho» : 6, = b. For given observed data 
X,,...,Xn find all values of b such that the test constructed in part (a) accepts Hoy. The set 
of all such 6} forms our coefficient 1 — ag confidence interval. It is clear from the form of the test 
that, once we find three values bj < bz < b3 such that Ho», is accepted and Hoy, and Ho», are 
rejected, we don’t have to check any more values of b < b, or b > bg since all of those would be 
rejected also. Similarly, if we find b4 < bs such that both Ho», and Ho», are accepted, the so are 
Ho» for all bg < b < bs. This will save some time locating all of the necessary b values. 
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Linear Statistical Models 


11.1 The Method of Least Squares 


Commentary 


If one is using the software R, the functions lsfit and 1m will perform least squares. While lsfit has 
simpler syntax, 1m is more powerful. The first argument to 1sfit is a matrix or vector with one row for each 
observation and one column for each «x variable in the notation of the text (call this x). The second argument 
is a vector of the response values, one for each observation (call this y). By default, an intercept is fit. To 
prevent an intercept from being fit, use the optional argument intercept=FALSE. To perform the fit and store 
the result in regfit, use regfit=lsfit(x,y). The result regfit is a “list” which contains (among other 
things) coef, the vector of coefficients (o,..., 3; in the notation of the text, and residuals which are defined 
later in the text. To access the parts of regfit, use regfit$coef, etc. To use lm, regfit=lm(y~x) will 
perform least squares with an intercept and store the result in regfit. To prevent an intercept from being fit, 
use regfit=lm(y~x-1). The result of 1m also contains coefficients and residuals plus fitted. values 
which equals the original y minus residuals. The components of the output are accessed as above. 

The plot function in R is useful for visualizing data in linear models. In the notation above, suppose 
that x has only one column. Then plot(x,y) will produce a scatterplot of y versus x. The least-squares 
line can be added to the plot by lines(x,regfit$fitted.values). (If one used 1sfit, one can create the 
fitted values by regfit$fitted.values=y-regfit$residuals.) 


Solutions to Exercises 
1. First write cyx; + cg = c1(%j — En) + (C1Fp + C2) for every i. Then 
(cya + €2)” = C(x; = ce + (c1¥n + c)? poe (ie =Ta) (Cita. oS): 


The sum over all 7 from 1 to n of the first two terms on the right produce the formula we desire. The 
sum of the last term over all 2 is 0 because ci(c1%, + cz) is the same for all 1 and S°"_, (a; — Z,) = 0. 


2. (a) The result can be obtained from Eq. (11.1.1) and the following relations: 


nm n 


Say _ Rn) (Ue _ Yn) = So (ivi — EnYi — Ynti + Ta) 
i=1 —* h ‘ 
1=1 i=1 i=1 


n 
= ‘3 LiYi — NE nYn — NL Yn + NFnYn 
i=1 
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nm 

= >, ii — NG Vn, end 
i=1 
nm 


(x4 - En)* = > (x? — 27,0; + z) 
i=1 


nm nm 
= soa — n>. Li + ni 
i=l i=l 


nm 
= 2 =2 a2 
= > Ls — nx, + nz, 


i Ma 
I 


(b) The result can be obtained from part (a) and the following relation: 


n 


S(t: —Fn)(vi— Gn) = So(mi — Faye — YS (Fi — Fn) 


i=l i=l i=l 
n n 
= ie — En)Yi — Gn se — En) 
i=l i=1 
n n 
— Yi @& —Zn)yi, since Ss @ =—#,,) = 0. 
i=l i=l 


(c) This result can be obtained from part (a) and the following relation: 


nm nm n 
Va twea a) = ee ae) > 
c=] i=] tL 
nm n 
= > ayo), since DOr —o,) = 0: 
I=] 4=1 


3. It must be shown that y, = Bo + Bikn- But this result follows immediately from the expression for Bo 


given in Eq. (11.1.1). 


. Since the values of G9 and 3; to be chosen must satisfy the relation 0Q/0{o = 0, it follows from Eq. 


(11.1.3) that they must satisfy the relation So (yi — %;,) = 0. Similarly, since they must also satisfy 
i=1 " 
relation 0Q/06, = 0, it follows from Eq. (11.1.4) that KC — §)x; = 0. These two relations are 
i=1 
equivalent to the normal equations (11.1.5), for which 69 and (6; are the unique solution. 


. The least squares line will have the form x = yo + y1y, where yo and 7; are defined similarly to By and 


(, in Eq. (11.1.1) with the roles of x and y interchanged. Thus, 


n 
LiYi — NLnYn 
i=1 


and Yo = In — Vin. It is found that 4 = 0.9394 and 40 = 1.5691. Hence, the least squares line is 
x = 1.5691+0.9394 y or, equivalently, y = —1.6703+1.0645x. This line and the line y = —0.786+0.685a 
given in Fig. 11.4 can now be sketched on the same graph. 
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6. The sum of the squares of the deviations from the least squares parabola is the minimum possible value 


nm 
of So (yi — Bo — Bix; - Box?) as 39,1, and {2 vary over all possible real numbers. The sum of the 
i=l 
squares of the deviations from the least squares line is the minimum possible value of this same sum 
when { is restricted to have the value 0. This minimum cannot therefore be smaller than the first 


minimum. 


nm nm 
7 (a) Here, a =8, t, = 2.25, Fy — 42.125, ye = 764, and xa = 51. Therefore, by Eq. (11.1.1), 
i=1 i=1 
By = 0.548 and Bo = 40.893. Hence, the least squares line is y = 40.893 + 0.5482. 


(b) The normal equations (11.1.8) are found to be: 
889 + 1861 +5162 = 337, 
1869 + 516; +1626. = 764, 
5169 + 1628; + 548.255 2167.5. 
Solving these three simultaneous linear equations, we obtain the solution: 


Bo = 38.483, 6, =3.440, and By = —0.643. 


8. If the polynomial is to pass through the k + 1 given points, then the following k + 1 equations must be 
satisfied: 


Bo + Biti +--+ + Bat = yn, 
Bo + Bite +-+++ Byad = yo, 


Bo + Bitegi t-++ + Bechay = Yet: 


These equations form a set of k +1 simultaneous linear equations in $o,...,6;. There will be a unique 
polynomial having the required properties if and only if these equations have a unique solution. These 
equations will have a unique solution if and only if the (k+1) x (k+1) matrix of coefficients of 8o,..., Bx 
is nonsingular (i.e., has a nonzero determinant). Thus, it must be shown that 


1 Hel x? ak 
1 x2 x3 xk 
2 
det 2 | £0. 
2 k 
1 UR+1 Tet = UE+1 


This determinant will be 0 if and only if the & + 1 columns of the matrix are linearly dependent; i.e., 


if and only if there exist constants a,,...,@%41, not all of which are 0, such that 
1 Ly xy xt 0 
ay |: | +a2 : + ag : eee py : =\5 
: TR+1 Tie hey 0 
But if such constants exist, then the & + 1 distinct values x1,...,2%41 will all be roots of the equation 


2 k 
ay + ag + age” +--+ + Gp412". 


It is impossible, however, for a polynomial of degree k or less to have k + 1 distinct roots unless all the 
coefficients a,,...,@%41 are 0. It now follows that the determinant cannot be 0. 
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9. The normal equations (11.1.13) are found to be 


1089 + 11708; +188. = 1359, 
117089 + 138, 1008; + 213082 = 160,380, 
1889 + 21308, +388. = 2483. 


Solving these three simultaneous linear equations, we obtain the solution Bo = 3.7148, Bi = 1.1013, and 
Bo = 1.8517. 


10. We begin by taking the partial derivative of the following sum with respect to 69,61, and 2, respec- 
tively: 


2 
(yi; — Borin — Bivig — Box)’. 
i 


n 
i= 


By setting each of these derivatives equal to 0, we obtain the following normal equations: 


n n nm n 
2 
Bo >) t+ Bi >) tite + B20 tary = D> way, 
i=l i=l i=l = 
n n n n 
Bo S> wi 2:2 + Br D> vio + Bod a = So cay, 
i=l i=l i=l i=l 


nm n n nm 
2 3 4 2 
Bo x LiL + Py Lin + Bo Ly LQ = :> LiQVi- 
1 =I ei i=l 


When the given numerical data are used, these equations are found to be: 


138, 1008 + 21308; + 455082 = 160,380, 
213089 +388; +908. = 2483, 
455089 +908; +2308. = 5305. 


Solving these three simultaneous linear equations, we obtain the solution Bo = 1.0270, ii = 17.2934, 
and 8 = —4.0186. 


11. In Exercise 9, it is found that 


10 
S> (ys — Bo — Bieta — oxi)” = 102.28. 


i=1 
In Exercise 10, it is found that 
10 


SS (yi Bora — Brvi2 Box)? = 42.72. 
i=1 


Therefore, a better fit is obtained in Exercise 10. 
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11.2 Regression 


Commentary 


The regression fallacy is an interesting issue that students ought to see. The description of the regression 
fallacy appears in Exercise 19 of this section. The discussion at the end of the section on “Design of the 
Experiment” is mostly of mathematical interest and could be skipped without disrupting the flow of material. 

If one is using the software R, the variances and covariance of the least-squares estimators can be computed 
using the function 1s.diag or the function summary.1m. The first takes as argument the result of lsfit, and 
the second takes as argument the result of Im. Both functions return a list containing a matrix that can be 
extracted via $cov.unscaled. For example, using the notation in the Commentary to Sec. 11.1 above, if we 
had used 1lsfit, then morefit=ls.diag(regfit) would contain the matrix morefit$cov.unscaled. This 
matrix, multiplied by the unknown parameter 0”, would contain the variances of the least-squares estimators 
on its diagonal and the covarainces between them in the off-diagonal locations. (If we had used 1m, then 
morefit=summary.1m(regfit) would be used.) 


Solutions to Exercises 


1. After we have replaced 69 and 3; in (11.2.2) with Bo and Ai, the maximization with respect to o? is 
exactly the same as the maximization carried out in Example 7.5.6 in the text for finding the M.L.E. 
of a7, 


2. Since E(Y;) = Go + 614;, it follows from Eq. (11.2.7) that 


mr nm 


So (xi — En)(B0 + Bivi) Bo d>(@i — En) + Br DS Bi(ai — Tn) 
B(h) = = _ ia : i=l 


S(e:- Fn)? Ye: - 24)" 
But 0%4 (ai — Zn) = 0 and 
ie: — Zn) = 5 2;(ai — En) - aC — Zn) = (ai — En)’. 
i=l i=l 
It follows that E(3,) = (1. 


3. E 5A E(¥%;) = — ale + Bits) = Bo + iFn. 


Hence, as ea near the ‘and of the proof of Theorem 11.2.2, 


E(Go) — E(Yn) _ EnE (61) = (Bo a Bide) — ZnB, = Bo. 


A. Let s2 = SG — %,)?. Then 
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Since Y;,...,¥Y; are independent and each has variance o?, 
. eg al 
Var(50) = » 7. 52 (vi —Zn)| Var(¥;) 
i=1 £ 

n ~2 = 
1 r 22 

2 n = AD n = 

= oO — i _ —— 
Bo at Ble tat — hla 0 


as shown in part (a) of Exercise 2 of Sec. 11.1. 


5. Since Y,, = Bo + BiFn, then 
Var(Yn) = Var(o) + £;, Var(61) + 2@n Cov(Go, 61). 


Therefore, if Z, 4 0, 


Cov(o, 81) = s— [Var(¥,) — Var(Bo) — 22 Var(Ax) 


n 


= L o J=1 2 tn 2 
= = —=— 5) = 9 9 
27, | n nss Fa 
nr 
2 2 ) 
, (2-Doee ne 
ee i=1 
2h, nse 
2 a2 2 
_ 6 —2nZ;,\  —Ino 
— OF 2 2 
2k n nss, Si. 
1 nm 
If Z, =0, then Bo =Y, =—)_ Yj, and 
nial 


Cov (So, 41) = Cov (2: Yi eg 7a , = — 


gD D8 | Cov(Yi; Yj); 


by Exercise 8 of Sec. 4.6. Since Yj,..., Y;, are independent and each has variance o?, then Cov(Yj, iS 
0 for i Aj and Cov(Yj, Yj) =o? for i = j. 


Hence, 


Cov(8o, 81) = nad as rj; =0. 


Sr j=] 
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. Both Bo and ; are linear functions of the random variables Yj,... Yn Which are independent and 


have identical normal distributions. As stated in the text, it can therefore be shown that the joint 
distribution of Bo and Bi is a bivariate normal distribution. It follows from Eq. (11.2.6) that Bo and By 
are uncorrelated when £,, = 0. Therefore, by the property of the bivariate normal distribution discussed 
in Sec. 5.10, Bo and By are independent when Z,, = 0. 


(a) The M.L.E.’s Bo and 3; are the same as the least squares estimates found in Sec. 11.1 for Table 11.1. 
The value of ¢? can then be found from Eq. (11.2.3). 

(b) Also, Var($9) = 0.250502 can be determined from Eq. (11.2.5) and Var(3,) = 0.02770? from Eq. 
(11.2.9). 

(c) It can be found from Eq. (11.2.6) that Cov(80, 81) = —0.064602. By using the values of Var(8o) 
and Var({,) found in part (b), we obtain 


a 3 Cov(So, 61) 


(Bo, F1) = [Var(da) Var(By))72 = —0.775. 


. 6 = 389 — 28; +5 = 1.272. Since 6 is an unbiased estimator of 6, the M.S.E. of 6 is the same as Var (6) 


and 


Var(0) = 9 Var(8) + 4 Var(8,) — 12 Cov(Bo, 61) = 3.14007. 


. The unbiased estimator is 3Bo + c1 Bt. The M.S.E. of an unbiased estimator is its variance, and 


Var(0) = 9 Var() + 6c, Cov(So, 81) + c? Var(S1). 
Using the values in Exercise 7, we get 
Var(0) = 07[9 x 0.2505 — 6c; (0.0646) + c20.0277]. 


We can minimize this by taking the derivative with respect to c, and setting the derivative equal to 0. 
We get c, = 6.996. 


The prediction is Y= Bo + 2B, = 0.584. The M.S.E. of this prediction is 
Var(Y) + Var(Y) = Var(8o) + 4 Var(81) + 4 Cov(Go, B1) + o? = 1.1030”. 


Alternatively, the M.S.E. of Y could be calculated from Eq. (11.2.11) with « = 2. 


By Eq. (11.2.11), the M.S.E. is 


1 nm 
5 SG — 2)? +1] 0. 


We know that 7%_, (a; — x)? will be a minimum (and, hence, the M.S.E. will be a minimum) when 
CS Cis 


The M.L.E.’s Bo and Br have the same values as this least squares estimates found in part (a) of 
Exercise 7 of Sec. 11.1. The value of a? can then be found from Eq. (11.2.3). Also, Var((o) can be 
determined from Eq. (11.2.5) and Var(,) from Eq. (11.2.9). 
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13. It can be found from Eq. (11.2.6) that Cov(o, 61) = —0.2140?. By using the values of Var(@o) and 
Var(3,) found in Exercise 12, we obtain 
Cov(6o, 61) 
[Var(8o) Var(61)]"/? 


14. 6=5-— 48) + 8; = —158.024. Since 6 is an unbiased estimator of 0, the M.S.E. is the same as Var(6) 
and 


p(B, 61) = — —0.891. 


Var(6) = 16 Var(8o) + Var(A1) — 8 Cov(Bo, 81) = 11.5240”. 


15. This exercise, is similar to Exercise 9. Var(@) attains its minimum value when c; = —Zp. 


16. The prediction is Y = Bo + 3.258) = 42.673. The M.S.E. of this prediction is 
Var(Y) + Var(Y) = Var(Bo) + (3.25)? Var(B1) + 6.50 Cov( fp, 61) + 0? = 1.22007. 
Alternatively, the M.S.E. of Y could be calculated from Eq. (11.2.11) with # = 3.25. 


17. It was shown in Exercise 11, that the M.S.E. of Y will be a minimum when z = Zp, = 2.25. 


18. (a) It is easiest to use a computer to find the least-squares coefficients. These are Bo = —1.234 and 
By = 2.702. 


(b) The predicted 1980 selling price for a species that sold for x = 21.4 in 1970 is 
Bo + Bye = —1.234 + 2.702 x 21.4 = 56.59. 
(c) The average of the x; values is 41.1, and s2 = 18430. Use Eq. (11.2.11) to compute the M.S.E. as 
1 (214-411 


2 
2 ) 2 
1 = 1, : 
oO a4 T 18430 | 0930 


19. The formula for E(X2|x1) is Eq. (5.10.6), which we repeat here for the case in which pz; = pug = pw and 
0, =090=0: 


v1 — 


B(Xalai) = 1 + po ( ) =H ler 1). 


We are asked to show that |F(X|r1) — | < |x, — p| for all 21. Since 0 < p< 1, 


|E(X2|21) — wl = |e + p(ei — #) — w| = pli — pw] < |a1 — pI. 


11.3. Statistical Inference in Simple Linear Regression 


Commentary 


Computation and plotting of residuals is really only feasible with the help of a computer, except in problems 
that are so small that you can’t learn much from residuals anyway. There is a subsection at the end of this 
section on joint inference about $9 and 6,. This material is mathematically more challenging than the rest 
of the section and might be suitable only for special sets of students. 

If one is using the software R, both 1m and lsfit provide the residuals. These can then be plotted 
against any other available variables using plot. Normal quantile plots are done easily using qqnorm with 
one argument being the residuals. The function qqline (with the same argument) will add a straight line to 
the plot to help identify curvature and outliers. 
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Solutions to Exercises 


n n 
1. It is found from Table 11.9 that Z, = 0.42, Jn = 0.33, )° x? = 10.16, S~ iy; = 5.04, 6, = 0.435 and 
i=1 i=1 
89 = 0.147 by Eq. (11.1.1), and S? = 0.451 by Eq. (11.3.9). Therefore, from Eq. (11.3.19) with n = 10 
and (65 = 0.7, it is found that Up = —6.695. It is found from a table of the ¢ distribution with n — 2 = 8 
degrees of freedom that to carry out a test at the 0.05 level of significance, Ho should be rejected if 
|Uo| > 2.306. Therefore Ho is rejected. 


2. In this exercise, we must test the following hypotheses: 
Ho : Bo = 0, 
A : Bo # 0. 


Hence, 65 = 0 and it is found from Eq. (11.3.19) that Up = 1.783. Since |Uo| < 2.306, the critical value 
found in Exercise 1, we should not reject Ho. 


3. It follows from Eq. (11.3.22), with Gf = 1, that U; = —6.894. Since |U;| > 2.306, the critical value 
found in Exercise 1, we should reject Ho. 


4. In this exercise, we want to test the following hypotheses: 


Ho : Bi = 0, 
Ay: Py #0. 


Hence, 6; = 0 and it is found from Eq. (11.3.22) that Uy = 5.313. Since |U;| > 2.306, we should reject 
Ho. 


5. The hypotheses to be tested are: 
Ho : 589 — Bi = 0, 
A, : 589 — Bi #0. 
n 
Hence, in the notation of (11.3.13), co = 5,c, = —1, and c, = 0. It is found that S "(cori _ c)? = 306 


i=l 
and, from Eq. (11.3.14), that Up; = 0.664. It is found from a table of the ¢ distribution with n — 2 = 8 
degrees of freedom that to carry out a test at the 0.10 level of significance, Hp should be rejected if 
|Uoi| > 1.860. Therefore, Ho is not rejected. 


6. The hypotheses to be tested are: 


Ho: 60 + fi = 1, 
A, : Bo + fi Al. 


n 
Therefore, cp) = cj = c, = 1. It is found that \ (om —c,)? = 11.76 and, from Eq. (11.3.14), that 


i=1 
Uo, = —4.701. It is found from a table of the t distribution with n — 2 = 8 degrees of freedom that to 
carry out a test at the 0.10 level of significance, Ho should be rejected if |Uo1| > 3.355. Therefore, Ho 
is rejected. 
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Cov(f1,D) = Cov( + Bi En) 

= Cov( ) + Zn Cov(Br, 1) 
Cov(Bo, 61) + En Var(B1) 
= 0, by Eqs. (11.2.9) and (11.2.6). 


Bi, Bo 
Pr, Bo 


Since Bo and Br have a bivariate normal distribution, it follows from Exercise 10 of Sec. 5.10 that D 
and (, will also have a bivariate normal distribution. Therefore, as discussed in Sec. 5.10, since D and 
6, are uncorrelated they are also independent. 


(a) We shall add nz2(, — 8%)? and subtract the same amount to the right side of Q?, as given by 
Eq. (11.3.30). The Q? can be rewritten as follows: 


Q? = (>: af na) (B61 — BY)? + n[(Bo — 63)? + 2%n(Bo — B3)(B1 — Bi) + #2.(41 — BY)*] 
iA 
_— > (Bi — BFy? a * La A *\12 
= 0 Var(da) + n[(8o — 85) + Zn(B1 — B7)]°. 
Hence, 


7 Z *)2 
S = GO + FD ~ 85 ~ Bt) 
It remains to show that Var(D) = ce But 
Var(D) = Var(B) + £2 Var(B1) + 2%, Cov(Bo, B1). 
The desired result can now be obtained from Eqs. (11.2.9), (11.2.5), and (11.2.6). 


(b) It follows from Exercise 7 that the random variables 8, and D are independent and each has a 
normal distribution. When Hp is true, E(G,) = Bf and E(D) = 6% + Bez,. Hence, Hp is true, 
each of the two summands on the right side of the equation given in part (a) is the square of a 
random variable having a standard normal distribution. 


. Here, 85 =0 and #* = 1. It is found that Q? = 2.759, S? = 0.451, and U? = 24.48. It is found from a 


table of the F' distribution with 2 and 8 degrees of freedom that to carry out a test at the 0.05 level of 
significance, Hp should be rejected if U? > 4.46. Therefore, Ho is rejected. 


To attain a confidence coefficient of 0.95, it is found from a table of the ¢ distribution with 8 degrees of 
freedom that the confidence interval will contain all values of 85 for which |Uo| < 2.306. When we use 
the numerical values found in Exercise 1, we find that this is the interval of all values of 85 such that 
—2.306 < 12.111(0.147 — 65) < 2.306 or, equivalently, —0.043 < 65 < 0.338. This interval is, therefore, 
the confidence interval for Jo. 


The solution here is analogous to the solution of Exercise 9. Since the confidence coefficient is again 
0.95, the confidence interval will contain all values of 6] for which |U;| < 2.306 or, equivalently, for 
which —2.306 < 12.207(0.435 — 6) < 2.306. The interval is, therefore, found to be 0.246 < 6; < 0.624. 


We shall first determine a confidence interval for 539 — 6; with confidence coefficient 0.90. It is found 
from a table of the ¢ distribution with 8 degrees of freedom (as in Exercise 5) that this confidence 
interval will contain all values of c, for which |Up;| < 1.860 or, equivalently, for which —1.860 < 
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2.207(0.301 — c,) < 1.860. This interval reduces to —0.542 < c, < 1.144. Since this is a confidence 
interval for 589 — 61, the corresponding confidence interval for 569 — 6, + 4 is the interval with end 
points (—0.542) + 4 = 3.458 and (1.144) + 4 = 5.144. 


We must determine a confidence interval for y = 69+, with confidence coefficient 0.99. It is found from 
a table of the t distribution with 8 degrees of freedom (as in Exercise 6) that this confidence interval will 
contain all values of c, for which |Uo;| < 3.355 or, equivalently, for which —3.355 < 11.257(0.582— 6) < 
3.355. This interval reduces to 0.284 < c, < 0.880. This interval is, therefore, the confidence interval 
for y. 


We must determine a confidence interval for y = 69 + 0.42;. Since the confidence coefficient is again 


0.99, as in Exercise 13, this interval will again contain all values of c, for which |Uoi| < 3.355. Since 
n 


co = 1 and cy = 0.42 = Z, in this exercise, the value of S "(cori — c1)’, which is needed in determining 
i=1 


n 
Up, is equal to bye? — £,)* = 8.396. Also, co80 + e181 = Bo + A, In = Yn = 0.33. Hence it is found 


i=l 
that the confidence interval for y contains all values of c, for which —3.355 < 13.322(0.33 — c,) < 3.355 
or, equivalently, for which 0.078 < c, < 0.582. 


Let gq be the 1 — ag/2 quantile of the t distribution n — 2 degrees of freedom. A confidence interval for 
Go + 61x contains all values of c, for which |Uoi| < c, where cp = 1 and c, = x in Eq. (11.3.14). The 
inequality |Uo1| < q can be reduced to the following form 


n 1/2 n 
oS a =”) o DEC? = 

i=1 

(n 


<b<fyot+abi+¢ 


it 
F : =1 
ee — 2)s2 n(n — 2)s2 


The length of this interval is 


a L 
maa 2a) 


2 
n(n 


n 
The length will, therefore, be a minimum for the value of « which minimizes Se — x)”. We know 
i=1 
that this quantity is a minimum when x = Zp. 
It is known from elementary calculus that the set of points (x,y) which satisfy an inequality of the 
form Ax? + Bay + Cy? < c? will be an ellipse (with center at the origin) if and only if B? — 4AC <0. 
It follows from Eqs. (11.3.30) and (11.3.32) that U? < ¥ if and only if 


n(Bi — Bo)? + 2nFn(B3 — Bo)(B a) + (3 au 2) = <7 58? 


Hence, the set of points (5, 37) which satisfy this inequality will be an ellipse [with center at (Bo, B1)| 
if and only if 


n 
(QnZp)* — 4nS > xj <0 
i=1 
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or, equivalently, if and only if 


n 


2 a2 
x x, — nx, > 0. 
i=1 


nm 

Since the left side of this relation is equal to Gi En)’, it must be positive, assuming that the 
i=l 

numbers 21,...,2%ny are not all the same. 


17. To attain a confidence coefficient of 0.95, it is found from a table of the F' distribution with 2 and 
8 degrees of freedom (as in Exercise 9) that the confidence ellipse for (69,1) will contain all points 
(83, Bf) for which U? < 4.46. Hence, it will contain all points for which 


10(8% — 0.147)? + 8.4(6% — 0.147)(6F — 0.435) + 10.16(6F — 0.435)? < 0.503. 


18. (a) The upper and lower limits of the confidence band are defined by (11.3.33). In this exercise, n = 10 
and (2y)1/2 = 2.987. The values of 6, 61, and S$? have been found in Exercise 1. 
Numerical computation yields the following points on the upper and lower limits of the confidence 


band: 
x Upper limit Lower limit 
—2 —.090 —1.356 
—1 124 —.700 
0 395 —.101 
ty, = 0:42 504 106 
1 848 316 
2 1.465 569 


The upper and lower limits containing these points are shown as the solid curves in Fig. $.11.1. 


(b) The upper and lower limits are now given by (11.3.25), where T,2(1 — ap/2) = 2.306. The 
corresponding values of these upper and lower limits are as follows: 


x Upper limit Lower limit 
—2 — 234 —1.212 
—1 .030 —.606 
0 338 —.044 
En = 0.42 503 157 
1 787 377 
2 1.363 671 


These upper and lower limits are shown as the dashed curves in Fig. $.11.1. 


19. If S? is defined by Eq. (11.3.9), then $?/o? has a y? distribution with n — 2 degrees of freedom. 
Therefore, E(S?/o07) =n — 2, E(S”) = (n — 2)o?, and E(S?/|n — 2]) = 07. 


20. (a) The prediction is Bo + 6,X = 68.17 — 1.112 x 24 = 41.482. 


(b) The 95% predicition interval is centered at the prediction from part (a) and has half-width equal 
to 


1 (24—30.91)2]'? 


a2 2054.8 
So, the interval is 41.482 + 8.978 = [32.50, 50.46]. 


T39° (0.975)4.281 | 1+ = 8.978. 
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Ya 


x V 


Figure $.11.1: Confidence bands and intervals for Exercise 18a in Sec. 11.3. 


21. (a) A computer is useful to perform the regressions and plots for this exercise. The two plots for parts 
(a) and (b) are side-by-side in Fig. $.11.2. The plot for part (a) shows residuals that are more 
spread out for larger values of 1970 price than they are for smaller values. This suggests that the 
variance of Y is not constant as X changes. 


(b) The plot for part (b) in Fig. $.11.2 has more uniform spread in the residuals as 1970 price varies. 
However, there appear to be two points that are not fit very well. 


22. In this problem we are asked to regress logarithm of 1980 fish price on the 1970 fish price. (It would 
have made more sense to regress on the logarithm of 1970 fish price, but the problem didn’t ask for 
that.) The summary of the regression fit is 69 = 3.099, 6; = 0.0266, o’ = 0.6641, Z, = 41.1, and 

2 
si, = 18480. 


(a) The test statistic is given in Eq. (11.3.22), 


By —2 0.0266 — 2 
= 135.83———_—__ = _ — 403.5. 
a! 0.6641 


We would reject Hop at level 0.01 if U is greater than the 0.99 quantile of the ¢ distribution with 
12 degrees of freedom. We do not reject the null hypothesis at level 0.01. 


U=s, 


(b) A 90% confidence interval is centered at 0.0266 and has half-width equal to 


0.6641 


=0. 2. 
135.8 oe 


o’ 
T;,' (0.95) — = 1.782 
Sx 


So, the interval is 0.0266 + 0.00872 = [0.0179, 0.0353]. 
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Residual 
Residual 


T T T ; ; 7 ; 1 
i) 20 40 60 80 100 120 140 1 2 


3 4 5 


1970 price Log-1970 price 


Figure $.11.2: Residual plots for Exercise 21a of Sec. 11.3. The plo 
on the right is for part (b). 


23. 


t on the left is for part (a) and the plot 


(c) A 90% prediction interval for the logarithm of 1980 price is centered at 3.099+0.0266 x 21.4 = 3.668 


and has half-width equal to 


Tis (0.95)e" 


ete 
ep) — 1.782 x0. 


1 
1+—+ 
nm 


1.237. 


So, the interval for the logarithm of 1980 price is 3.668 4 
to 1980 price take e to the power of both endpoints to ge 


6641 |1+ 1, (@4— 412] 
14 18430 


E 1.237 = (2.431, 4.905]. To convert this 
t (11.37, 134.96]. 


If we had been asked to regress on the logarithm of 1970 fish price, the summary results would have been 


Bo = 1.132, 8; = 0.9547, o/ = 0.2776, EF, = 3.206, and s? = 19. 


11. The test statistic for part (a) would 


have been 4.371(0.9547 — 2)/0.2776 = —16.46. Still, we would not reject the null hypothesis at level 
0.01. The confidence interval for 6; would have been 0.9547 + 1.782(0.2776/4.371) = [0.8415, 1.067]. 
The prediction interval for the logarithm of price would have been 


1 (log(21.4) — 
1.132 + 0.9547 log (21.4) + 0.2776 (4ae eels) 


14 19.11 


The interval for 1980 price would then be [43.34, 77.02]. 
Define 


i / 
n oO 


” -(3 aha ah] ath Bo) + ¢2(B1 = 61) 
01 — | — sy a [a 


Sf 


1/2 
206)? 
e208) = [3.769, 4.344]. 


? 


which has the ¢ distribution with n — 2 degrees of freedom. Hence 


Pr(Wo1 > Tr'(1 — a0)) = a0. 


Suppose that coo + ¢181 < cx. Because [(c2/n) + (co®n — ¢1)?/8?]/o’ > 0, it follows that Wo1 > Uo. 


Finally, he probability of type I error is 


Pr(Uo1 > Ty 4(1 — a0)) < Pr(Wo1 > Ty’, (1 — a0)) = a0, 
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where the first inequality follows from Wo, > Uo1. 
24. (a) When cp = 1 and cj = x > 0, the smallest possible value of 85 + x] occurs at the smallest values 
of 65 and GF simultaneously. These values are 
|" 


2 
TS (1 — 2) ’ 


a 2 
peso [ ae 
‘ 4 
pi ee Oe 
Similarly, the largest values occur when (9 and (f both take their largest possible values, namely 


a =2 1/2 
Bo + 0’ $+ Ty) (1— 92), 


By + £T,", (l=): 


The confidence interval is then 


“a A ! 1 a ae 8 —1 Qo 
ppt piso > |= + PS PT et lI, 
nN SF, Sy 4 


a A 1 
iy et ie 


(b) When co = 1 and c; = x < 0, the smallest possible value of 85 + xf occurs when (33 takes its 
smallest possible value and {¥ takes its largest possible value. Similarly, the largest possible value 
of 65 + @6f occurs when (6 takes its largest possible value and {7 takes its smallest possible value. 
All of these extreme values are given in part (a). The resulting interval is then 


1/2 
A 5 Lome x a 
’ T-1 0 


iy 


ep peice 
mr 


25. (a 


ae 


The simultaneous intervals are the same as (11.3.33) with [2F 3, 1 4(1—ap)]!/? replaced by T1,(1— 
ago/4), namely for i = 0,1, 


= i 
Bo + Bit + T44(1 — ap/4)o" |= + : 
nm Si. 


(x; _ zal 1/2 


(b) Set 2 = ax + (1 — a)ax, and solve for a. The result is, by straightforward algebra, 
a(r) = —— "1, 

to — £41 

(c) First, notice that for all x, 


bot irc = a(x)[Bo + B1X0| + (1 = a(x)][Go + B12]. (S.11.1) 
That is, each parameter for which we want a confidence interval is a convex combination of the 
parameters for which we already have confidence intervals. 


Suppose that C' occurs. There are three cases that depend on where a(z) lies relative to the interval 
[0,1]. The first case is when 0 < a(x) < 1. In this case, the smallest of the four numbers defining 
L(x) and U(x) is L(x) = a(x)Ap + [1 — a(x)] A; and the largest is U(x) = a(x) Bo + [1 — a(x)|Bi, 
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because both a(x) and 1 — a(x) are nonnegative. For all such x, Ap < 89 + 6120 < Bo and 
A, < 69 + 61x21 < By together imply that 


a(x)Ap + [1 — a(x)]|Ar < a(x)[8o + 8120] + [1 — a(x)][60 + 6121] < a(x) Bo + [1 — a(x)]B,. 


Combining this with ($.11.1) and the formulas for L(#) and U(x) yields L(x) < 89 + Bix < U(x) 
as desired. The other two cases are similar, so we shall do one only of them. If a(x) < 0, 
then 1 — a(x) > 0. In this case, the smallest of the four numbers defining L(x) and U(z) is 
L(x) = a(x)Bo + [1 — a(x)]A1, and the largest is U(x) = a(x) Ap + [1 — a(x)|B,. For all such z, 
Ap < 89 + 61% < Bo and A; < 89 + 6,2, < By, together imply that 


a(a)Bo + [1 — a(@)]A1 < a(x)[80 + 8120] + [1 — a(x)] [80 + Brai] < a(a)Ao + [1 — a(@)] Bi. 


Combining this with (S.11.1) and the formulas for L(#) and U(x) yields L(x) < 89 + fix < U(x) 
as desired. 


11.4 Bayesian Inference in Simple Linear Regression 


Commentary 


This section only discusses Bayesian analysis with improper priors. There are a couple of reasons for this. 
First, the posterior distribution that results from the improper prior makes many of the Bayesian inferences 
strikingly similar to their non-Bayesian counterparts. Second, the derivation of the posterior distribution 
from a proper prior is mathematically much more difficult than the derivation given here, and I felt that 
this would distract the reader from the real purpose of this section, namely to illustrate Bayesian posterior 
inference. This section describes some inferences that are similar to non-Bayesian inferences as well as some 
that are uniquely Bayesian. 


Solutions to Exercises 


1. The posterior distribution of ; is given as a special case of (11.4.1), namely that U = s,(() — Bi) /o' 


has the ¢ distribution with n — 2 degrees of freedom. The coefficient 1 — ag confidence interval from 
Sec. 11.3 has endpoints 8; + Tl — ao/2)o'/s,. So, we can compute the posterior probability that 
6, is in the interval as follows: 


Pr (4 can Gee a10/2)— £62 ot. i= 00/2) ) 


— Pr (-7:40 — ap/2) < 2A ra bee | «0/2) (S.11.2) 


Since the ¢ distributions are symmetric around 0, —T774,(1 — a9/2) = T7!,(ao/2). Also, the random 


variable between the inequalities on the right side of (S.11.2) is U, which has the t¢ distribution with 
n — 2 degrees of freedom. Hence the right side of (S.11.2) equals 


Pr(U < T~1(1 — ap/2)) — Pr(U < T7+(ag/2)) = 1 — 09/2 — ag/2 = 1— a9. 


2. The posterior distribution of 3; is given in (11.4.1), namely that 


n ee a! 


u = [2 4 (60% =)? ayy cody + e161 ~ eodo + e181) 
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has the ¢ distribution with n — 2 degrees of freedom. The coefficient 1 — ag confidence interval from 
Sec. 11.3 has endpoints 


2 
Cc 
4 
nm 


3 3 = Cotn — C 2 Ay 
coBo + e181 + Ty9(1 — a9/2)o" mel] 


2 
52 


So, we can compute the posterior probability that 3, is in the interval as follows: 


eh [3 , com= a] 
Pr { cofo + e181 — T,-2(1 — a0/2)o ao =| = coBo + c1P1 


x 


a) A —1 / ei (con —c)* ne 
< coho + 181 — Ty 9 (1 — a0/2)o" | = + —+,—— 


n Be 


= Pr (-T2 —ap/2)<U<T4,01- a0/2)) : 
As in the proof of Exercise 1, this equals 1 — ao. 
. The joint distribution of (Go, 31) given 7 is a bivariate normal distribution as specified in Theorem 11.4.1. 


Using the means, variances, and correlation given in that theorem, we compute the mean of 89 + 61x 
as $9 + (ia = Y. The variance of 69 + 6," given T is 


1/2 
Li oe. ae a (eer 1 
= to 2 ae (243 —. 
7 Vo 6 ee n a. Be Sz 
i=1 
Use the fact that 1/n + %2/s2 = 37"_, x?/[ns?] to simplify the above variance to the expression 


1/2 


1 E " (a — Ep)? 


It follows that the conditional distribution of 7!/ 2(89 — Bix — Y) is the normal distribution with mean 
O and variance as stated in the exercise. 


. The summaries from a simple linear regression are Bo = 0.1472, By = 0.4352, 0’ = 0.2374, a, = 0.42, 
n = 10 and s? = 8.396. 


(a) The posterior distribution of the parameters is given in Theorem 11.4.1. With the numerical sum- 
nm 


maries above (recall that y, x? = s* + nz” = 10.16), we get the following posterior. Conditional 
i=1 

on T, (69, 5,) has a bivariate normal distribution with mean vector (0.1472, 0.4352), correlation 

—0.4167, and variances 0.1210/7 and 0.1191/7. The distribution of 7 is a gamma distribution 

with parameters 4 and 0.2254. 


(b) The interval is centered at 0.4352 with half-width equal to Tz (0.95) times 0.2374/8.396!/2 = 
0.0819. So, the interval is [0.2828, 0.5876]. 
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(c) The posterior distribution of Go is that U = (89 — 0.1472)/0.1210!/? has the t distribution with 
8 degrees of freedom. So the probability that 6 is between 0 and 2 is the probability that U is 
between (0 — 0.1472)/0.3479 = —0.4232 and (2 — 0.1472) /0.3479 = 5.326. The probability that a 
t random variable with 8 degrees of freedom is between these two numbers can be found using a 
computer program, and it equals 0.6580. 


5. The summary data are in the solution to Exercise 4. 
(a) According to Theorem 11.4.1, the posterior distribution of 6; is that 2.898(6, — 0.4352) /0.2374 
has the ¢ distribution with eight degrees of freedom. 
(b) According to Theorem 11.4.1, the posterior distribution of 69 + 6; is that 


01 4 (042-1? ~M? ay + 6, — 0.5824 
2.8982 0.2374 


has the ¢ distribution with eight degrees of freedom. 


6. The summary information from the regression is Bo = 1.182, By = 0.0547..0° = 0.2776, ry = 3.206, 
n = 14, and s? = 19.11. 


(a) The posterior distribution of 6; is that U = 19.11'/?(@, — 0.9547)/0.2776 has the t distribution 
with 12 degrees of freedom. 


(b) The probability that 6, < 2 is the same as the probability that U < 19.111/?(2— 0.9547) /0.2776 = 
16.46, which is essentially 1. 


(c) The interval for log-price will be centered at 1.132 + 0.9547 x log(21.4) = 4.057 and have half- 
width T;5' (0.975) times 0.2776[1 + 1/14 + (3.206 — log(21.4))?/19.11]!/? = 0.2875. So, the interval 
for log-price is [3.431, 4.683]. The interval for 1980 price is e to the power of the endpoints, 
[30.90, 108.1]. 


7. The conditional mean of 8p given 6, can be computed using results from Sec. 5.10. In particular, 
= l 29) 2 
nN, —s 
i Be (- + ¥,,/s;, 


S) 


n 
Now, use the fact that > a? = s* + n&*. The result is 
i=1 


- 
E(Bo|61) = Bo - (81 — 61). 


E(80|61) = Bo + En(B1 — 61). 


11.5 The General Linear Model and Multiple Regression 


Commentary 


If one is using the software R, the commands to fit multiple linear regression models are the same as those 
that fit simple linear regression as described in the Commentaries to Secs. 11.1-11.3. One need only put the 
additional predictor variables into additional columns of the x matrix. 
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Solutions to Exercises 


1. After we have replaced {,..., 8p in (11.5.4) by their M.L.E.’s Bes whe fas the maximization with respect 
to o? is exactly the same as the maximization carried out in Example 7.5.6 in the text for finding the 
M.L.E. of o? or the maximization carried out in Exercise 1 of Sec. 11.2. 


2. (The statement of the exercise should say that S?/o? has a x? distribution.) According to Eq. (11.5.8), 


1» 8 


Since we assume that $?/o? has a y? distribution with n — p degrees of freedom, the mean of $? is 


19. / . . 
o?(n — p), hence the mean of o ? is a”, and a ? is unbiased. 


3. This problem is a special case of the general linear model with p = 1. The design matrix Z defined by 
Eq. (11.5.9) has dimension n x 1 and is specified as follows: 


In 


nm 
Therefore, Z’Z = ya and (Z’Z)~' = 


- ya 


It follows from Eq. (11.5.10) that 


n 


Vas 
j= 
nm 


nm 
4. From Theorem 11.5.3, E(8) = 6 and Var(8) = of ya. 
i=l 
5. It is found that ye eee = 9424 ond YE 2? = 66.8. Therefore, from Exercises 3 and 4, 8 = 5.126 
and Var(8) = 0.015007. Also, S? = 3%%1(y; — Ba;)? = 169.94. Therefore, by Eq. (11.5.7), 6? = 
(169.94) /10 = 16.994. 


6. By Eq. (11.5.21), the following statistic will have the ¢ distribution with 9 degrees of freedom when Ho 
is true: 


9 1/2, 
= | 7.0150)(169.94) a 


The corresponding two-sided tail area is smaller than 0.01, the smallest two-sided tail area available 
from the table in the back of the book. 
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The values fo, 31, and 82 were determined in Example 11.1.3. 
By Eg. (11.5.7), 


. 2% 5 1 
(yi — Bo — Biv; — Box?)” = 79 (9:37) = 0.937. 
i=1 


. The design matrix Z has the following form: 


1 ry a 
1 £2 x3 
Z= . 
1 Ln ae 


Therefore, Z’Z is the 3 x 3 matrix of coefficients on the left side of the three equations in (11.1.14): 


Z'Z =| 23.3 90.37 401 


90.37 401 1892.7 


10 23.3 90.37 | 


It will now be found that 


—0.307 0.421 —0.074 
0.046 —0.074 0.014 


(Z'Z) = 


0.400 —0.307 sr 


The elements of (Z’Z )~', multiplied by o?, are the variances and covariances of Bo, B1, and Ap. 


. By Eq. (11.5.21), the following statistic will have the ¢ distribution with 7 degrees of freedom when Ho 


is true: 


7 1/2. 
| Bz = 0.095. 


a Foomors 


The corresponding two-sided tail area is greater than 0.90. The null hypothesis would not be rejected 
at any reasonable level of significance. 


By Eq. (11.5.21), the following statistic will have the t distribution with 7 degrees of freedom when Ho 
is true: 


i 
By —4) = 4.51. 


“= |Ganem m 


The corresponding two-sided tail area is less than 0.01. 
It is found that 37", (yi; — Jn)? = 26.309. Therefore, 


2 


R?=1-— = 
26.309 


0.644 
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The values of Bo, By and Bo were determined in Example 11.1.5. 
By Eq: (11.5.7), 


1 
— (8.865) = 0.8865. 


1 4 7 
a Q 2 
G a5 dui = fy — Praia — Boxe) 10 


The design matrix Z has the following form: 


1 14 X42 

1 rq] 292 
Z- 

1 Ini In2 


Therefore, Z’Z is the 3 x 3 matrix of coefficients on the left side of the three equations in (11.1.14): 


Z'Z =| 23.3 90.37 1563.6 


650 1563.6 42,334 


10 23.3 650 | 


It will now be found that 


(Z'Z)*=| 4832 0.1355 -—0.0792 


—3.598 —0.0792 0.0582 


222.7 4.832 —3.598 | 


The elements of (Z'Z )~', multiplied by o?, are the variances and covariances of Bo, Bi, and Bo. 


By Eq. (11.5.21), the following statistics will have the t distribution with 7 degrees of freedom when 
A is true: 
7 1/2 


U1 = | 791355)(8.865) 


By = 1.087. 


The corresponding two-sided tail area is between 0.30 and 0.40. 


By Eq. (11.5.21), the following statistic will have the t distribution with 7 degrees of freedom when Ho 
is true: 


Us = Bo +1) = 4.319. 


7 1/2 
| ORE ( 


The corresponding two-sided tail area is less than 0.01. 


nm 
Just as in Exercise 11, So (yi — Jn)? = 26.309. Therefore, 
i=1 
2 


26.309 


R?=1- = 0.663. 
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Cov(B;, Ay): = Cov (4: i Si 9,) 


— Cov(8;, Bi) ~ 2 Cov(8;, B;) 
ag 


= Cov(Ai,8;) - 7 Var(3j) 


JJ 


= (eras = SH 659? =O: 


Gj J 


Just as in simple linear regression, it can be shown that the joint distribution of two estimators ; 
and B; will be a bivariate normal distribution. Since A;; is a linear function of B; and 8; the joint 
distribution of A;; and B; will also be a bivariate normal distribution. Therefore, since Aj; and B; are 
uncorrelated, they are also independent. 


Var(Aij) = Var (3 — 8,) 


Ci 
Ci 2 ¢ eo 
= Var(6;) + (2) Var (5;) — 2 Cov(G;, B;) 
Gi Gi 
: a ( 2 2) 9 
= Gio + , a = up 
aa ij Gi Cig 


Now consider the right side of the equation given in the hint for this exercise. 


[Ay -E(Ag)? = [6-6 - “(6-6/7 
Gi 
2¢; i 
= (8; =] Bi)? == (8; — Bi) (B; = 85) + = (8; — 6;)? 
Ci 3 


If each of the two terms on the right side of the equation given in the hint is put over the least common 
denominator (Gji¢j; — 70 o”, the right side can be reduced to the form given for W? in the text of 
the exercise. In the Sanation for W? given in the hint, W? has been represented as the sum of two 
independent random variables, each of which is the square of a variable having a standard normal 
distribution. Therefore, W? has a x? distribution with 2 degrees of freedom. 


(a) Since W? is a function only of B; and B;, it follows that W? and S$? are independent. Also, W? has 
a x? distribution with 2 degrees of freedom and $?/c? has a y? distribution with n — p degrees of 
W?2/2 

S?/[o?(n — p)] 

(b) If we replace 8; and 8; in W? by their hypothesized values 67 and 8;, then the statistic given in 
part (a) will have the F distribution with 2 and n—p degrees of freedom when Hp is true and will 
tend to be larger when Ho is not true. Therefore, we should reject Ho if that statistic exceeds some 
constant C’, where C' can be chosen to obtain any specified level of significance ag(0 < ag < 1). 


freedom. Therefore, has the F' distribution with 2 and n — p degrees of freedom. 
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20. In this problem 7 = 2,7 = 3, 6] = 05 = 0 and, from the values found in Exercises 7 and 8, 


Ww? = (0.014) (0.616)? + (0.421) (0.013)? + 2(0.074)(0.616)(0.13) 16.7 
— TOs.) = OTHF® 


Also, S? = 9.37, as found in the solution of Exercise 7. Hence, the value of the F statistic with 2 and 7 
degrees of freedom is (7/2)(16.7/9.37) = 6.23. The corresponding tail area is between 0.025 and 0.05. 


21. In this problem, 7 = 2,7 = 3, Gf = 1, 65 =0 and from the values found in Exercises 12 and 13, 


(0.0582)(0.4503 — 1)? + (0.1355)(0.1725)? + 2(0.0792) (0.4503 — 1)(0.1725) 


W? = [(0.1355) (0.0582) — (0.0792)2]o? 


4.091 
me 


Oo 


Also, S? = 8.865, as found in the solution of Exercise 12. Hence, the value of the F statistic with 2 and 
7 degrees of freedom is (7/2)(4.091/8.865) = 1.615. The corresponding tail area is greater than 0.05. 


22. S? = S “(yi — Bo — Bizi)?. Since Bo = Gn — Bitn, 
i=l 


2 = Siew hee 


= yin)? — BP S- (ai — En)? — 261 So (ai — En) (Yi — Tn): 
i=l i=l i=l 
iC? — En) (Yi — Yn) 92 
Since By = a and R? =1-— —————, the desired result can now be obtained. 
"(yi — Jn) “(vi — Jn) 
i=l i=1 
23. We have the following relations: 
Ay + ¥y BUX + ¥1) E(X1) + £(M%) 
B(X+¥) = EB] : |= 3 - , 
Xn+YVn E(Xn + Yn) E(Xn) + E(Yn) 
E(X4) E(¥1) 
= : + : = h(X)+E(Y ) 
E(Xn) E(Yn) 


24. The element in row 7 and column j of the n x n matrix Cov(X + Y) is Cov(X; + Y;,X; + Yj) = 
Cov(X;, X;) + Cov(X;, Y;) +Cov(Y¥i, X;)+Cov(¥;, Yj). Since X and Y are independent, Cov( Xj, Y;) = 
0 and Cov(Y;, X;) = 0. Therefore, this element reduces to Cov(X;, X;) + Cov(¥j, Yj). But Cov(X;, X;) 
is the element in row i and column j of Cov(X ), and Cov(Yj, Y;) is the corresponding element in 
Cov(Y ). Hence, the sum of these two covariances is the element in row 7 and column j of Cov(X ) + 
Cov(Y ). Thus, we have shown that the element in row i and column j of Cov(X + Y ) is equal to the 
corresponding element of Cov(X ) + Cov(Y ). 
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25. We know that Var(3Y, + Yo — 2Y3 + 8) = Var(3Y; + Yo — 2Y3). By Theorem 11.5.2, with p = 1, 
. 
Var (3Y1 + Yo — 2Y3) = (3,1, —2) Cov(Y ; - 
26. (a) We can see that ar Gi 8; is equal to cB, where c is defined in part (b) and B is the least-squares 


a 


NS 


regression coefficient vector. If Y is the vector in Eq. (11.5.13), then we can write c’ B =ay. 
where a! = c'(Z'Z)~'Z. It follows from Theorem 11.3.1 that a’Y has a normal distribution, and 
it follows from Theorem 11.5.2 that the mean of a’Y is c’@ and the variance is 


o*a'a = 07el(Z'Z)} (8.11.3) 


If Ho is true, then c’ B has the normal distribution with mean c, and variance given by (S.11.3). 
It follows that the following random variable Z has a standard normal distribution: 


Z cB — ce 
~ a(el(Z'Z)—1e) 1/2" 
Also, recall that (n— p)o? /o? has a y? distribution with n—p degrees of freedom and is independent 


of Z. So, if we divide Z by o’/o, we get a random variable that has a t distribution with n — p 
degrees of freedom, which also happens to equal U. 


To test Hp at level ag, we can reject Ho if |U| > y Geer —ao/2). If Hp is true, 
Priv] > 7, 7p (1 — a0/2)) =i; 


so this test will have level ag. 


27. In a simple linear regression, Y; is the same linear function of X; for all i. If Bi > 0, then every unit 
increase in X corresponds to an increase of By inY. So, a plot of residuals against Y will look the same 
as a plot of residuals against X except that the horizontal axis will be labeled differently. If 6, < 0, 
then a unit increase in X corresponds to a decrease of — By in Y, so a plot of residuals against fitted 
values is a mirror image of a plot of residuals against X. (The plot is flipped horizontally around a 
vertical line.) 


28. Since R? is a decreasing function of the residual sum of squares, we shall show that the residual sum 
of squares is at least as large when using Z’ as when using Z. Let Z have p columns and let Z’ have 
q <p columns. Let B, be the least-squares coefficients that we get when using design matrix Z’. For 
each column that was deleted from Z to get Z’, insert an additional coordinate equal to 0 into the 
q-dimensional vector B, to produce the p-dimensional vector 3. This vector @ is one of the possible 
vectors in the solution of the minimization problem to find the least-squares estimates with the design 
matrix Z. Furthermore, since @ has 0’s for all of the extra columns that are in Z but not in Z’, it 
follows that the residual sum of squares when using 3 with design matrix Z is identical to the residual 
sum of squares when using B, with design matrix Z’. Hence the minimum residual sum of squares 
available with design matrix Z must be no larger than the residual sum of squares using 3 with design 
matrix Z’. 


29. In Example 11.5.5, we are told that o’ = 352.9, so the residual sum of squares is 2864383. We can 
nm 


calculate S (yi —J,,)° directly from the data in Table 11.13. It equals, 26844478. It follows that 


i=1 


2864383 
2 
=1- — — = 0.893. 
26844478 
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30. Use the notation in the solution of Exercise 26. Suppose that c’3 = d. Then U has a noncentral t 
distribution with n—p degrees of freedom and noncentrality parameter (d—c)/[o(e!(Z’Z)~‘e)'/?]. The 
argument is the same as any of those in the book that derived noncentral ¢t distributions. 


11.6 Analysis of Variance 


Commmentary 


If one is using the software R, there is a more direct way to fit an analysis of variance model than to construct 
the design matrix. Let y contain the observed responses arranged as in Eq. (11.6.1). Suppose also that x is 
a vector of the same dimension as y, each of whose values is the first subscript 7 of Y;; in Eq. (11.6.1) so that 
the value of x identifies which of the p sample each observation comes from. Specifically, the first nz elements 
of x would be 1, the next n2 would be 2, etc. Then aovfit=lm(y~factor(x)) will fit the analysis of variance 
model. The function factor converts the vector of integers into category identifiers. Then anova(aovfit) 
will print the ANOVA table. 


Solutions to Exercises 


1. By analogy with Eq. (11.6.2), 


Ny 0) 0) 1/ny 0 0 
0 hg se 0 0 il ng 0 
ZZ=|. _ | and (Z2'Z)-1 = I 
0 O ++ np 0 O ++ I/ny 
Also, 
ny 
yy _ 
j=l Vig 
Z'Y = and (Z'Z)1Z'Y = 


2. Let A be the orthogonal matrix whose first row is u in the statement of the problem. Define 


nt? V4 


y= 


lagen 
Let v’ be a vector that is orthogonal to u (like all the other rows of A.) Then v/X = v'Y/c. Define 


U = AX =(Uj,...,Up), 
V = AY=(V,...,Vp/. 


We just showed that V;/o = U; for 1 = 2,...,n. Now, 


X'X = (AX)(AX) ay 
$=1 

Y'Y = (AY)'(AY) =S v2. 
i=1 
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Pp 
Notice that V; = > mY i4/n/? = ke are hence 


i=1 
SBet LQ x2 vy? 1 / 2 : ey — 
“Betw _ Yo. —nYi,=—(Y’Y-V2)= —) =\°v?. 
7) aoe i+ Ut? Sean =a ?) d o d. a 


Now, the coordinates of X are i.i.d. standard normal random variables, so the coordinates of U are 
p 


also i.i.d. standard normal random variables. Hence S- U? = Siew /o” has a x? distribution with p—1 
i=2 
degrees of freedom. 


3. Use the definitions of Y;, and Y;4 in the text to compute 
P 


Pp P 
Yo ni(Yin —¥44)? = So m¥A -— n¥?, - 2%, 50 mu Yin 
i=l i=l i=l 


Pp 


P = = 
= yar. — eae . 


4. (a) It is found that Y;, = 6.6, Yo, = 9.0, and Y3, = 10.2. Also, 
ny 


n2 3 
>> (Yj — V4)? = 1.90, )° (Wo; — Yor)? = 17.8, 5° (Wa; — Yay)? = 5.54. 
j=l j=l j=l 


Hence, by Eq. (11.6.4), 6? = (1.90 + 17.8 + 5.54)/13 = 1.942. 


(b) It is found that Y,4 = 8.538. Hence, by Eq. (11.6.9), U? = 10(24.591) /[2(25.24)] = 4.871. When 
Hp is true, the statistic U? has the F distribution with 2 and 10 degrees of freedom. Therefore, 
the tail area corresponding to the value U? = 4.871 is between 0.025 and 0.05. 


5. In this problem n; = 10 for i = 1,2,3,4, and Yj, = 105.7, Yo, = 102.0, 3, = 93.5, ¥4, = 110.8, and 


Fun = 103. 
10 . 10 7 
doy - Yin)? = 303, So (Ya; — You)? = 544, 
j= j=l 
10 _ 10 7 
S>(¥3j — Yaz)? = 250, S°(Yaj — Yay)? = 364. 
j=l j=l 


Therefore, by Eq. (11.6.9), 


36(1593.8) 


(= 
3(1461) 


= 13.09. 


When Hp is true, the statistic U? has the F distribution with 3 and 36 degrees of freedom. The tail 
area corresponding to the value U? = 13.09 is found to be less than 0.025. 
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6. The random variables Q),...,@, are independent, because each Qj; is a function of a different group 
of the observations in the sample. It is known that Q;/o? has a x? distribution with n; — 1 degrees of 
Pp 


freedom. Therefore, (Qi +---+Q,»)/o? will have a x? distribution with Vm — 1) =n-p degrees of 


Qi/lo2(m — 1) = 


Qp/[o? (np = 1)] 
and nm, — 1 degrees of freedom. In other words, (np — 1)Q1/[(m1 — 1)Q,] will have that F' distribution. 


freedom. Since Q; and @, are independent, will have the F distribution with n, — 1 


7. If U is defined by Eq. (9.6.3), then 


(m+n— yee 7 s 
(4 +4) (924+53) — 


m 


U2 = 


The correspondence between the notation of Sec. 9.6 and our present notation is as follows: 


Notation of Sec. 9.6 Present notation 


m Ny 

Ga na 

Xm Yi4 

n You 

NY 7 

Sx >> - K+) 
j=l 
n2 7 

a. S° (Ya; — You)? 
j=l 


Since p = 2, Yy4 =m1Y14/n + noYo,/n. Therefore, 


ma(Yiz — ¥44)? + meo(Yor — ¥44)? 
ng 2 = = 2 Ny - = — 2 
= ny (=) (Yi4. — You)" + ng (“) (Y14 — Yo+) 


since nj + ng =n. 


| 
iS 
+ 

| 
fas | 
oe 


Also, since m+n in the notation of Sec. 9.6 is simply n in our present notation, we can now rewrite 
the expression for U? as follows: 


This expression reduces to the expression for U? given in Eq. (11.6.9), with p = 2. 


a = Yi)? 


1 Pp ng = _ 
8. |—— 5 Se — ia) 
n—p =e 


=> 
= yg 
=1 j= = Pt 
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1 P 
= So (ni —1)o”, by the results of Sec. 8.7, 
MPI 
1 
= (n — p)o? = a”. 
m—p 


9. Each of the three given random variables is a linear function of the independent random variables 
Vogl SS 1 cc. 0nd FHI, . 22, B) 
ll 1 


Let Wi = 90 ars¥rs, Wo = Yong brsYrs, and W3 =) crs¥rs. We have aij = 1- —, aig = —— for 
r,s , r,8 Ni; Ni 
: : 1 1 1 es tan3 1 
s #j, and ars = 0 for r #7. Also, bv, = — — — and b,, = —— for r #7. Finally, c-; = — for all 
Ny! n nm nm 


values of r and s. 
Now, Cov(W,, W2) = Cov (s cg Vas S- bee) = pe > GpgOptet COV Ving, Viet) 
r,s r’,s! T,8 r,s! 


But Cov(Y;s, Y-s/) = 0 unless r = r’ and s = s’, since any two distinct Y’s are independent. Also, 
Coven Yea) = Val) =e: 


Therefore, Cov(W1, W2) = 0? Drs Orebre Li =z, 


1 1 if 1 1 1 
rsYrs = AijOi7 ieVig =(1l-— — = 3 4 —_- —_- 7 TI)". 
Pitre = ats + aut t= (1-=) (SF) eeu (-2) (a) = 


T,8 s#j Ny 
lis¢i, 
1 1 1 if 
Same (-A)(-A) ran AY)» 
r,s Ny nr Ny n 
Similarly, 


1 
Cov(W, Ws) = o S, arsCrs = o— (o + bs oi] =0 


m8 s#j 


Finally, 


r,s r,s 


10. (a) The three group averages are 825.8, 845.0, and 775.0. The residual sum of squares is 1671. The 
ANOVA table is then 


Source of Degrees of 
variation freedom Sum of squares Mean square 


Between samples 2 15703 7851 
Residuals 15 1671 111.4 
Total 17 17374 
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(b) The F statistic is 7851/111.4 = 70.48. Comparing this to the F' distribution with 2 and 15 degrees 
of freedom, we get a p-value of essentially 0. 


Write $2.4. => 4 Y; 4 —n¥ is oS that the mean of the square of a normal random variable with 
mean p and variance o? is ? +07. The distribution of Y;, is the normal distribution with mean 4; 
and variance o?/n;, while the dictributien of Y,4 is the normal distribution with mean 77 and variance 
o*/n. Hence the mean of $2.,., is 


do ni(u? + 07 /m) — ni + 0? /n). 
i=1 


If we collect terms in this sum, we get (p — 1)o? + ~?_, niu? — np. This simplifies to the expression 
stated in the exercise. 


If the null hypothesis is true, the likelihood is 
Poni 
(20)-"/?2-" exp 32 ye ba TT fo ; 
i=1 j=1 


This is maximized by choosing p = ¥7,4 and 


i tg 
= - 2 Yig — Tear — = tot: 
The resulting maximum value of the likelihood is 


nn/2 n 
(2)~"/2_____ exp (-5) : (S.11.4) 
(Sio)"/? 2 


If the null hypothesis is false, the likelihood is 
Poni 
(2m) -"?o™ exp Gee = y. sl Vig — bi)? : 
i=1 j=1 


This is maximized by choosing pu; = ¥;, and 


ie 1 
= Pa 2 Vij — it) = ~ Seid 
The resulting maximum value of the likelihood is 


nnr/2 n 
(24)-"/2____ exp (-5) . (S.11.5) 
(SResia)™/? 2 


The ratio of (S.11.5) to (S.11.4) is 


n/2 n/2 
( STot =(1 A ses | ; 
Shesid Shesid 


Rejecting Ho when this ratio is greater than k is equivalent to rejecting when U? > k’ for some other 
constant k’. In order for the test to have level ag, k’ must equal the 1 — ap quantile of the distribution 
of U? when Hp is true. We saw in the text that this distribution is the F distribution with p— 1 and 
n — p degrees of freedom. 
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13. We know that S24, = SPamw + SResiq- We also know that $2.,,,/o7 and S2..i4/07 are independent. If 
Hp is true then they both have x? distributions, one with p— 1 degrees of freedom (see Exercise 2) 
and the other with n — p degrees of freedom as in the text. The sum of two independent y? random 
variables has y? distribution with the sum of the degrees of freedom, in this case n — 1. 


P 
14. (a) (The exercise should have asked you to prove that )7?_, nia; = 0.) We see that So nai = 
i=1 


nm 
Ni pli — NL = 0. 
i=1 
(b) The M.L.E. of y; is Y;4 and the M.L.E. of p is Y44, so the M.L.E. of a; is Yi, —Y 44. 
(c) Notice that all of 44; equal each other if and only if they all equal y, if and only if all a; = 0. 
(d) This fact was proven in Exercise 11 with slightly different notation. 


11.7 The Two-Way Layout 


Commmentary 


If one is using the software R, one can fit a two-way layout using 1m with two factor variables. As in the 
Commentary to Sec. 11.6 above, let y contain the observed responses, and let x1 and x2 be two factor 
variables giving the levels of the two factors in the layout. Then aovfit=lm(y~x1+x2) will fit the model, 
and anova(aovfit) will print the ANOVA table. 


Solutions to Exercises 


I 
1. Write $4 = J SY. y= JY; ,- Recall, that the mean of the square of a normal random variable with 
i=l 
mean j and variance o? is 2? + 02. The distribution of Y;, is the normal distribution with mean 1; 


and variance o?/J, while the distribution of Y;, is the normal distribution with mean py and variance 
I 


o*/IJ. Hence the mean of $% is JS + 07/J) — IJ (+ o7/[LJ]). If we collect terms in this sum, 
i=1 
we get 


I I I 
(=e +J3° pe =i =(r= Vo? + I 90 (ui =p == 16" +J> aj. 
i=1 i=1 i=1 


2. In each part of this exercise, let j4;; denote the element in row 7 and column j of the given matrix. 


(a) The effects are not additive because jig] — iy = 5 A M2 — fig = 7. 


(b) The effects are additive because each element in the second row is 1 unit larger than the corre- 
sponding element in the first row. Alternatively, we could say that the effects are additive because 
each element in the second column is 3 units larger than the corresponding element in the first 
column. 


(c) The effects are additive because each element in the first row is 5 units smaller than the corre- 
sponding element in the second row and is 1 unit smaller than the corresponding element in the 
third row. 


(d) The effects are not additive because, for example, fv21 — W11 = 1 4 M22 — pig = 2. 
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3. If the effects are additive, then there exist numbers 0; and WV; such that Eq. (11.7.1) is satisfied for 
(21 55.257 and 7 =1,.2.,32 Let O= +1 0: and VW = 511 V;, and define. 
p = O44, 
@;-© for i=1,...,]J, 
U;-wv for 9 = Vy eus5. 


Oy 
B; 
Then it follows that Eqs. (11.7.2) and (11.7.3) will be satisfied. 


It remains to be shown that no other set of values of ju, a;, and (; will satisfy Eqs. (11.7.2) and (11.7.3). 
Suppose that py’, a‘, and By are another such set of values. Then p+ a; + 8; = wp! + a4 + B; for all 7 
and j. By summing both sides of this relation over i and 7, we obtain the relation IJ = IJ’. Hence, 
=p’. It follows, therefore, that a; + 8; = aj + 6; for all i and j. By summing both sides of this 
relation over j, we obtain the result a; = a, for every value of 7. Similarly, by summing both sides over 
1, we obtain the relation 8; = B; for every value of 7. 


. If the effects are additive, so that is (11.7.1) is satisfied, and we denote the elements of the 
matrix by wij = 0; + Vj, then fz, = O+ WV, f;, = O; + V, and fy; = O+V;. Therefore, 


big = B44 + (Hit — B44) + (H4j — B44). Hence, it can be verified that yw = 44,0; = fi+ — 4+ and 
63 = +3 — B44. In this exercise, 


1 
By+ = qst6+4+7) =5, 

1 _ 1 
fit+ = 33 + 6) = 4.5, fioy = 5(4 + 7) = 5.5, 

1 1 
Hy = 38 + 4) = 3.5, fi42 = 510 + 7) = 6.5. 

. In this exercise, 

13 

By = 7 Has = —5 (39) = 3.25, 
i=1 j=1 

5 25 9 
fj, = —=1.25, fio, = — = 6.35, fig, = — = 2.25, 
Mi+ r Ma+ rl M34 A 

15 | 6 15 
I] — ——— 5 LL —— il LL — — =%9 =5,. 
M+1 3 »H+2 3 »H4+3 3 »H44 = z 


It follows that a, = 1.25 — 3.25 = —2, ag = 6.25 — 3.25 = 3, and ag = 2.25 — 3.25 = —1. Also, 
By =5 — 3.25 = 1.75, Bo = 1— 3.25 = —2.25, 63 = 2 — 3.25 = —1.25, and By = 5 — 3.25 = 1.75. 


I 
: ya yf ={7 3.2 ees iG — Some = 0. A similar argument show that ae = 0. 


i=1 j=l j=l 
‘Also. since "E(¥y) = =p+oa,t+ a 


7 1 IJ 1 f @ 
BD) = BU a ay ae) FllIu+0+0) =n 
i=1j=1 j=1 j=l 
= = 1 J 1 J 
B(@:) = B(Vit-Yas)=E (5%) — w= 5 e+ ai + By) — w= au. 
j=l j=l 


A similar argument shows that E (6;) = 6; 
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1 1 
7. Var(ji) = Var (4 bE Xi) > ap Se Vanl¥, a = aplto - a. 


J j= i=1 j=1 
The estimator @; is a linear function of the J.J independent random variables Y,, (r = 1,...,/ and 
IJ 
1 1 
=1,...,J). If let @; = Y;s, then it be found that aj, = = — — f il Ee 
8 , ) we let G; dd ars rs, then it can be foun at dis = > — 7 for s 
1 
and d,s, = “Fy for r £1. Therefore, 


V: (aj) — 2 ~2 ane ty +a »s(z) | I-14 
ar(a;) = ap, =O ==> = == — Ors 
2d J IJ id TJ 


The value of Var(8;) can be found similarly. 


8. If the square on the right of Eq. (11.7.9) is expanded and all the summations are then performed, we 
will obtain the right side of Eq. (11.7.8) provided that the summation of each of the cross product 
terms is 0. We shall now establish this result. Note that 


J 
So (Vig — Yin — Yay + Yau) = JV — JY = JY, + J¥44 =0. 
j=l 


J 
Similarly, Pees — Y;4 —¥4; + Y;4) =0. Therefore, 
j=l 
I I Z 
ae ee ep aH we). = aR) ae 
i=1 j=l i=l j=l 
Ea J I 
3% — Ya - Ya + Yay - Yh) = 300 - ¥4) 00% - Ya - Ya + Y44) = 0 
i=1 j=1 j=l oak 


9. Each of the four given random variables is a linear function of the independent random variables 
Veg PS essced Om 8 = Mecexaed se Gt 


i 4g: 
Wi = \ S- OrsYrsy 


f=1s=1 


fod 
= yy > brsYrs, 


r=1 s=1 


Id 
W3 = > S- Crs Yrs. 


r=1s=1 
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Then 
1 1 1 1 1 ; 
aj = tea eg for ri, 
1 1 1 
es = st for sj, and Ors = Fy for rAi, and sj. 
Also, 
1 1 1 
bys = 7 = re) and bps = TI for TF ee 
1 1 1 ; 
Crit = T i iy and Gs = “TF for s# i 


As in Exercise 12 of Sec. 11.5, Cov(W1, W2) = o S- Argbrs. Li =i, 


T,s 


Fonte = (1-5-34+8) F-B)+0-0( 5+ 8) 3-8) 
HT = 1) (-5+7) ( 7) sa ee) eet) (=) (-7) =0 
isi’, 
Yarden = (1-5-7477) (-7) (i,j term) 
| 1S = 1) (-5 + 7) (-z) (i,s terms for s # j) 
Chen) G-m) Catem 
on) G7) Weemtnn es 
1 1 1 : a) 
We aR) oes teers 


Similarly, the covariance between any other pair of the four variances can be shown to be 0. 


E I I pe I 
10. (fn — Pa)? = V2 — Bh Oh —1P2, = YP, — 272, +172, = V8 - 172. 
7=1 1=1 1=1 1=1. 1=1 


The other part of this exercise is proved similarly. 


IJ 
lt OK oY Ye =D et Oe tI Yat 
i=1g=1 tog t j 
2) DL Viy¥in — 2D) Yis¥ag + 2¥44 DV 
tog jt ij 


+2500 FY; - V4 D0 Yn - 2¥4 SOY, 
ij F j 
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12. 


13. 


14. 


15. 


16. 
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2 eee, 
2 ST Y2. — V2, +2172, 
+2 v2, - 21IV?, —21dY?, 
= )¥}-I Mh -1 P+ eR, 
a j a j 
It is found that ¥14=17.4, Yo, = 15.94, ¥g. = 17.08, Ys, = 15.1, ¥.o = 14.6, ¥s3 = 15.5333, ¥i4 = 


19.5667, ¥+5 = 19.2333, Y,, = 16.8097. The values of fi, for i = 1,2,3, and 6; for j = 1,...,5, can 
now be obtained from Eq. (11.7.6). 


The estimate of E(Y;;) is ji +4; + 6;. From the values given in the solution of Exercise 12, we therefore 
obtain the following table of estimated expectations: 


1 2 3 4 5 
1 15.6933 15.1933 16.1267 20.16 19.8267 
2 14.2333 13.7333 14.6667 18.7 18.3667 


3 15.3733 14.8733 15.8067 19.84 19.5067 


Furthermore, Theorem 11.7.1 says that the M.L.E. of o? is 
1 
es Ta (29.470667) = 1.9647. 


It is found from Eq. (11.7.12) that 


» _ 20(1.177865) 


— Si = 0.799. 
A (29.470667) 


When the null hypothesis is true, U3 will have the F distribution with J—1 = 2 and (I—1)(J—-1) =8 
degrees of freedom. The tail area corresponding to the value just calculated is found to be greater than 
0.05. 


It is found from Eq. (11.7.13) that 


6(22.909769) 
i= = A 664. 
B  (29.470667) 


When the null hypothesis is true, U2, will have the F distribution with J—1= 4 and (I—1)(J—-1) =8 
degrees of freedom. The tail area corresponding to the value just calculated is found to be between 
0.025 and 0.05. 


If the null hypothesis in (11.7.15) is true, then all Y;; have the same mean yp. The random variables 
S%/o*, S%/o7, and S?,,,,/07 are independent, and their distributions are x* with J — 1, J —1, and 
(I —1)(J — 1) degrees of freedom respectively. Hence $% + $3 has the y? distribution with I + J — 2 
degrees of freedom and is independent of a’. The conclusion now follows directly from the definition of 
the F' distribution. 
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11.8 The Two-Way Layout with Replications 


Commentary 


This section is optional. There is reference to some of the material in this section in Chapter 12. In 
particular, Example 12.3.4 shows how to use simulation to compute the size of the two-stage test procedure 
that is described in Sec. 11.8. 

If one is using the software R, and if one has factor variables x1 and x2 (as in the Commentary to 
Sec. 11.7) giving the levels of the two factors in a two-way layout with replication, then the following com- 
mands will fit the model and print the ANOVA table: 
aovfit=lm(y~x1*x2) 
anova(aovfit) 


Solutions to Exercises 


1. Let pp = O44,0; = O14 — O44, 8; = 04; — O44, and %ij = 94; 0:4 0,; O44 fori =1,...,7 
and j = 1,...,J. Then it can be verified that Eqs. (11.8.4) and (11.8.5) are satisfied. It remains to be 
shown that no other set of values of yu, a;,8;, and y%; will satisfy Eqs. (11.8.4) and (11.8.5). 


Suppose that pi’, aj, 35, and yj; are another such set of values. Then, for all values of i and j, 


Bt ait B+ yj =o ++ Bh + Ny 


By summing both sides of this equation over 7 and j, we obtain the relation [Ju = IJ’. Hence, p= pi’. 
It follows, therefore, that for all values of 7 and 7, 


Oi + By + Vig = 04 + Bi + %y- 


By summing both sides of this equation over 7, we obtain the relation Ja; = Ja. By summing both 
sides of this equation over i, we obtain the relation 18; = I G5. Hence, a; = aj, and B; = 65. It also 
follows, therefore, that 7; = Va: 


2. Since Yuu is the M.L.E. of 6;; for each 7 and j, it follows from the definitions in Exercise 1 that the 
M.L.E.’s of the various linear functions defined in Exercise 1 are the corresponding linear functions of 


Y;;+- These are precisely the values in (11.8.6) and (11.8.7). 


3. The values of ,a;,8;, and yj; can be determined in each part of this problem from the given values of 
©;; by applying the definitions given in the solution of Exercise 1. 


I I 
4, Soa ee a ea I HG 
i=l i=l 
I — = 
te ee ee ee a a 0 
i=1 i=1 
on a 
The proofs that y Bj =0 and > 4ij = 0 are similar. 
jal j=l 
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1 1 il - 
5. E(f) = —> Pe E(Yijn) = => x: Oy = —\S> ©;; = 0.4 = p, by Exercise 1; 
Le i,j,k ade i,j,k a tJ 
‘ 1 = 
E(aj) = se LL EMije) — EY 4) 
Ik 
1 = ‘ ; 
= 5k ce 0;; — O14, by the first part of this exercise 
Ik 
1 = = = 
a 5) Oi — 844 = Gi — O44 = aH, by Exercise 1; 
j 
E(8;) = 6;, by a similar argument; 
. i ae * 
Ej) = 7 o B(Yigr) — E(fit & + Bj), by Eq. (11.8.7) 
k 


= Oy - O,, — 0,7 +044, by the previous parts of this exercise, 
= Yj, by Exercise 1. 


6. The IJK random variables Y;;; are independent and each has variance o?. Hence, 
Var(jt) = V a : Var(Yijx) ee 
ar(j.) = Var | —— jk) = =a r(Yiin) = = oo ==. 
. TIK <2 8] = ry K ye oe GR = Ke IJK 
i,j,k i,j,k 
he estimator 4; is a linear function of the observations Yj; of the form a; = arr. GrstYrst; Where 


1 1 T=i P 1 
—_ - FS an Are = ———— 
JK IJK IJK’ a Lk 


Gist = 


for r #1. Hence, 


I-1)\? 1 I-1 
aol 2 f= saaemice _ peace 2_ 2 2 
Var(d;) y Or gt [sx (= ) +( HK ( TW ) | o WK? * 


r,s,t 


Var(8;) can be determined similarly. Finally, if we represent 4;; in the form 4; = Dad CrstYrst, then 


it follows from Eq. (11.8.7) that 


: : : Zs fort=1 K 
ji = > oar So Tae een eer 
“it “ KK” JK TK IK © eae 
1 1 

Crit = —FK + Te Ore Ft and t=1,...,K, 
1 1 . 

Cist = Fx t+ Tye rs FI and a nee oe 

1 
Crst = Tig OTT Ah 8Fa: and t=1,...,K. 

Therefore, 
Vary) = So eue? 
r,syt 
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1 It il 1 ¥ il i 4" 
ee fhe = a 
(= JK ztaz) + ( ) ( z+az) 


+(J—-DK (-= + nz) +(I-1)(F-1)K (Gz) | e 


7. First, write $?,, as 


Sita ea Oi a ee ee) ee Ye eR ee, G1L6) 
ink 


The sums of squares of the grouped terms in the summation in (S.11.6) are S234, Sf, $4, and S%. 
Hence, to verify Eq. (11.8.9), it must be shown that the sum of each of the pairs of cross-products 
of the grouped terms is 0. This is verified in a manner similar to what was done in the solution to 
Exercise 8 in Sec. 11.7. Each of the sums of (Yi, — Yo.) times one of the other terms is 0 by summing 
over k first. The sum of the product of the last two grouped terms is 0 because the sum factors into 


sums of 7 and j that are each 0. The other two sums are similar, and we shall illustrate this one: 


Summing over j first produces 0 in this sum. For the other one, sum over 3 first. 


8. Each of the five given random variables is a linear function of the IJK independent observations Y;-¢¢. 
Let 


Qi, = S Grate 
r,s,t 
By, _ SS bai Yost, 
r,s,t 
Viejo = Sy Sui Yre 
Ti Sit 
Yige — Yigs = SGV 
rst 


Of course, fi = Yrst/[[JK]. The value of aysz, brst,and cys were given in the solution of Exercise 6, 


Tit 
and 
1 
dik = 1- =, 
i 
ijt = ae for t#k, 
dys = O otherwise. 


To show that @;, and By are uncorrelated, for example, it must be shown, as in Exercise 9 of Sec. 11.7, 
that uke ArstUrst = 0. We have 


Tovine = K(E5t) (253) +-0-vn(53)(-) 


T;8,t 
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wo (-) (ad) +96) (re) 
= 0. 


Similarly, to show that Yj;,.— You and 4j)j. are uncorrelated, it must be shown that }°,. , ¢Crstdrst = 0. 


Suppose first that ¢ = ig and 7 = jo. Then 


ee ees 
nate: > te me TR Pe K 


T;8,t 
1 1 1 1 1 
Rel) 3 2a he eam 
+ (x JK iz tx) (-z) ? 


Suppose next that 7 = ig and 7 # jo. Then 


r,s,t 


idea = (= +: ax) (1 x) + (K 1) ( z + We (-=) =0. 


r,syt 


The other cases can be treated similarly, and it can be shown that the five given random variables are 
uncorrelated with one another regardless of whether any of the values j, 7, and jg are equal. 


9. We shall first show that the numerators are equal: 


Oe ee = Feet ee POS De a Yt ee 
ij a,j 
—2Y ig Yaj+ + 2% t Voss + 2¥i4 Ya 


2 i Von — 2¥ pu Voi.) 
= et ae Het ay 
inj i j 
—2I 9° ¥in4 — 200 V7, + 202, 
a j 
+21IV?,, —21IV?,, —21dY2,4 


= Soest) ela 
tJ a j 


Next, we shall show that the denominators are equal: 


pee - Vea) = » (Yin ss 25h Yij+ + Via) 


i,j,k ijk 
= © (Svbe-2673, +073.) 
4,9 k 
= Yo Vin —~K) ¥z,. 
i,j,k ag 


10. In this problem, J = 3, J = 4, and K = 2. The values of the estimates can be calculated directly from 
Eqs. (11.8.6), (11.8.7), and (11.8.3). 
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13. 


14. 
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It is found from Eq. (11.8.12) that U3, = 0.7047. When the hypothesis is true, U4, has the F 
distribution with ([—1)(J—1) = 6 and IJ(K —1) = 12 degrees of freedom. The tail area corresponding 
to the value just calculated is found to be greater than 0.05. 


Since the hypothesis in Exercise 11 was not rejected, we proceed to test the hypotheses (11.8.13). It is 
found from Eq. (11.8.14) that U% = 7.5245. When the hypothesis is true, U3 has the F distribution 
with (J — 1)J = 8 and 12 degrees of freedom. The tail area corresponding to the value just calculated 
is found to be less than 0.025. 


It is found from Eq. (11.8.18) that U2, = 9.0657. When the hypothesis is true, Uz, has the F distribution 
with J(J — 1) =9 and 12 degrees of freedom. The tail area corresponding to the value just calculated 
is found to be less than 0.025. 


The estimator ji has the normal distribution with mean y and, by Exercise 6, variance 07/24. Also, 
Van — Y;;4)?/o? has a y? distribution with 12 degrees of freedom, and these two random variables 
1,j,k 

are independent. Therefore, when Ho is true, the following statistic V will have the ¢t distribution with 
12 degrees of freedom: 


V24(ji — 8) 
E Sue — Yea)? 


V= 773" 


i,j,k 


We could test the given hypotheses by carrying out a two-sided t test using the statistic V. Equivalently, 
as described in Sec. 9.7, V? will have the F distribution with 1 and 12 degrees of freedom. It is found 
that 

2 24(0.7708)? 


iia 
— (10.295) 
12 


= 16.6221. 


The corresponding tail area is less than 0.025. 


The estimator G2 has the normal distribution with mean a and, by Exercise 6, variance 07/12. Hence, 
as in the solution of Exercise 14, when ag = 1, the following statistic V will have the ¢ distribution 
with 12 degrees of freedom: 


V'12(a2 — 1) 
E S> (Visa — Vi 


i,9,k 


ii 1/2 


The null hypothesis Hp should be rejected if V > c, where c is an appropriate constant. It is found 
that 


i eee VARQ{O.TORT) 2.8673 


1 1/2 
— (10.295 
7501020) 
The corresponding tail area is between 0.005 and 0.01. 
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16. Since E(Yijx) = w+ a4 + Bj + ij, then E(Yj;+) = u +a; + B; + ij. The desired results can now be 


Linear Statistical Models 


obtained from Eq. (11.8.19) and Eq. (11.8.5). 


17. 


I 
ya 
i=1 


1 Ios . . . 
=5) d%j+ — Ia = If Te =0 and 
i=1j=1 


I I I 
4% = (2 Fe - 12) — 0 4 — 16; = 18; —0- 18; =0. 
i=1 i=1 i=1 


It can be shown similarly that 


J A 
>» 6; =9 
j=l 


18. 


J 
and 5 = 0. 
j=l 


LJ 


Both f and 4; are linear functions of the > Be independent random variables Y;;,. Let fi = 


i=1 j=l 


ss MrstYrst and A; = :S GrstYrst- Then it is found from Eq. (11.8.19) that 


Ti8,t 


Myst — 


Ts3t 


for all values of r,s, andt, 


TI Kos 
and 
i 1 
ane i to 
1 : 
Arst ~TIK. for 1 # t. 


As in the solution of Exercise 8, 


Cov(fi, Qi) 


19. Notice that we cannot reject the second null hypothesis unless we accept the first null hypothesis, since 
we don’t even test the second hypothesis if we reject the first one. The probability that the two-stage 


2 
=": 0: 5 Mr stQr st 


r,syt 
_ pisass_ e= = eae 
=r TI Kis \ I Kis TI Kis el TI K ps Ld Kas 
J 
I-1 1 1 1 
2 
— ay 5} oe ee ps 
Hy J am Kis led a Kg 
oe J 1 Lo 4 J 1 
= Ti _ _ 
PP ( LK ae, Ke 
o ey | 
= DD ae oOo 
PJ? s=1 Kis r=1s=1 Krs 


procedure rejects at least one of the two hypotheses is then 


Pr(reject first null hypothesis) 


+ Pr(reject second null hypothesis and accept first null hypothesis). 
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The first term above is ag, and the second term can be rewritten as 


Pr(reject second null hypothesis|accept first null hypothesis) 
x Pr(accept first null hypothesis). 


This product equals 89(1 — ao), hence the overall probability is ag + 89(1 — ao). 


20. (a) The three additional cell averages are 822.5, 821.7, and 770. The ANOVA table for the combined 


samples is 
Source of Degrees of Sum of 
variation freedom squares Mean square 


Main effects of filter 1 1003 1003 

Main effects of size 2 25817 12908 
Interactions 2 739 369.4 
Residuals 30 1992 66.39 
Total 35 29551 


(b) The F statistic for the test of no interaction is 369.4/66.39 = 5.56. Comparing this to the F 
distribution with 2 and 30 degrees of freedom, we get a p-value of 0.009. 


(c) If we use the one-stage test procedure in which both the main effects and interactions are hypoth- 
esized to be 0 together, we get an F statistic equal to [(25817 + 739) /4]/66.39 = 100 with 3 and 
30 degrees of freedom. The p-value is essentially 0. 


(d) If we use the one-stage test procedure in which both the main effects and interactions are hypoth- 
esized to be 0 together, we get an F' statistic equal to [(1003 + 739) /3]/66.39 = 8.75 with 3 and 
30 degrees of freedom. The p-value is 0.0003. 


11.9 Supplementary Exercises 


Solutions to Exercises 


1. The necessary calculations were done in Example 11.3.6. The least-squares coefficients are Bo = —0.9709 
and 6, = 0.0206, with o’ = 8.730 x 1073, and n = 17. We also can compute s? = 530.8 and Z, = 203.0. 


(a) A 90% confidence interval for 6; is 6; + T74,(0.95)o’/s,z. This becomes (0.01996, 0.02129). 
(b) Since 0 is not in the 90% interval in part (a), we would reject Ho at level ag = 0.1. 


(c) A 90% prediction interval for log-pressure at boiling-point equal to x is 


1/2 
be 7 1 —Z, 2 
Bo + xBy + T.,1,(0.95)a’ (: + 7 + ep . 


With the data we have, this gives [3.233, 3,264]. Converting this to pressure gives (25.35, 26.16). 


2. This result follows directly from the expressions for 6,01, and G2 given in Exercise 24 of Sec. 7.6 and 
the expression for 6; given in Exercise 2a of Sec. 11.1. 


3. The conditional distribution of Y; given X; = x7; has mean $9 + 6,2;, where 


02 
Bo = 2 — oF is and ~, = —, 
O1 O71 
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and variance (1—?)o3. Since T = ;, as given in Exercise 2b of Sec. 11.1, it follows that E(T) = 6; = 
po2/o1 and 


Var(T) = b= 9% 
So (xi — Zn) 
i=1 


3 


. The least squares estimates will be the values of 6,, 02, and 03 that minimize Q = So (yi — 0;)", where 


i=1 
63 = 180 — 0, — 62. If we solve the equations 0Q/00, = 0 and 0Q/002 = 0, we obtain the relations 


Y1 — 91 = yo — 02 = y3 — 93. 


3 3 

Since So yi = 186 and sa = 180, it follows that 6; = y; — 2 for i = 1,2,3. Hence 6, = 81, 65 = 45, 
i=l i=l 

and 63 = 54. 


. This result can be established from the formulas for the least squares line given in Sec. 11.1 or directly 


from the following reasoning: Let 7; = a and x2 = b. The data contain one observation (a, y1) at 
x =a and n-— 1 observations (b, y2),..., (0, Yn) at x = b. Let u denote the average of the n — 1 values 
Y2,-++,Yn, and let hg and hy denote the height of the least square line at x = a and x = B, respectively. 
Then the value of Q, as given by Eq. (11.1.2), is 


bts > os 
j=2 


The first term is minimized by taking hg = y; and the summation is minimized by taking hy = u. 
Hence, Q is minimized by passing the straight line through the two points (a, y;) and (b, u). But (a, y1) 
is the point (x1, y1). 


. The first line is the usual least squares line y = Bo + Az, where By is given in Exercise 2a of Sec. 11.1. 


In the second line, the roles of « and y are interchanged, so it is x = @, + Gay, where 


Soy ~— n)* 

i=1 
Both lines pass through the point (Zp, %,), so they will coincide if and only if they have the same slope; 
i.e., if and only if By = 1/d2. This condition reduces to the condition that p? = 1, where / is given in 
Exercise 24 of Sec. 7.6 and is the sample correlation coefficient. But p? = 1 if and only if the n points 
lie exactly on a straight line. Hence, the two least squares lines will coincide if and only if all n points 
lie exactly on a straight line. 


. It is found from standard calculus texts that the sum of the squared distances from the points to the 
line is 
nm 
S> (yi — B1 — Boi)? 
62] 


1+ 6? 
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The equation 0Q/06; = 0 reduces to the relation 6; = Yn — B2%,. If we replace 6; in the equation 
0Q/082 = 0 by this quantity, we obtain the relation: 


nm nr 


(1+ 83) = So [(yi — Jn) — B2(vi — En) |xi + Bo S“[(Yi — Jn) — B2(ai — Zn)]? = 0. 


i=1 i=1 


Note that we can replace the factor x; in the first summation by x; — %, without changing the value 
of the summation. If we then let x) = x; — Z, and y; = y; — Yn, and expand the final squared term, we 
obtain the following relation after some algebra: 


n n 
(83-1) So aly + Bo d(x? — y??) =0. 
i=1 i=l 
Hence 


1/2 


(ve-22)+|( 


n 


2 a 2 
(u? _ 2) +4 (>. ct) | 
_ i=1 
23° xiv 
i=1 


i=l 


Either the plus sign or the minus sign should be used, depending on whether the optimal line has 
positive or negative slope. 


. This phenomenon was discussed in Exercise 19 of Sec. 11.2. The conditional expectation E(X2|X,) of 
the sister’s score X2 given the first twin’s score X, can be derived from Eq. (5.10.6) with 4 = w2 = 
and 0, = 02 =o. Hence, 


E(X2|X1) = w+ p(X1 — pw) = (1— p)ut pX1, 


which is between js and X;. The same holds with subscripts 1 and 2 switched. 


k Ne 


1 
v oo ig ~ ee — Big + Li4 — E44)? 

= gt 

= => (ay — 2:4)? + = 0 ni(Bin - 244)? 

tat i=1 

1 

= de nile? + (By —Z44)’]. 
4 


10. In the notation of Sec. 11.5, the design matrix is 


pe 


eae 
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12. 


13. 
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For ¢=w,2,Y¥ and u=w,2, Y, let Sy = 7, tw. Then 


n = Saas Swe 
BG = | pale 
1 Sag: = 
P = = 2 LL Wr 
(2 Z) 7 Swu Sex _ oe s, Sww | 
! = Swy 
Hence, 
1 SreSuy — S28, 
j Se ee rrPwYy wrrP xy 
(Z Z) a ae SwwS22 — Dare aw a Sosy | , 


The first component on the right side of this equation is Bo and the second is 4. 


It was shown in Sec. 11.8 that the quantity $2.,4/07 given in Eq. (11.8.10) has a x? distribution with 
I J(K —1) degrees of freedom. Hence, the random variable $?.,.,;/[[J(K — 1)] is an unbiased estimator 
of a, 


It follows from Table 11.23 that ifa; = 8; =O fori =1,...,f andj =1,...,J, and Q = $3 + $2, then 
Q/o? will have a y? distribution with (J — 1) + (J—1) =I+ J —2 degrees of freedom. Furthermore, 
regardless of the values of a; and B;, R = S}..,4/07 will have a x? distribution with (I — 1)(J — 1) 
degrees of freedom, and @ and R will be independent. Hence, under Ho, the statistic 


7. E=DU-e 
(I+J—2)R 

will have the F' distribution with J + J —2 and (I — 1)(J —1) degrees of freedom. The null hypothesis 

Ho should be rejected if U > c. 


Suppose that a; = 8; = yj; = 0 for all values of i and j. Then it follows from Table 11.28 that 
(S3+ 52+ S$? ,)/o? will have a y? distribution with ([—1)+(J—1)+(1-1)(J-1) = IJ—1 degrees 
of freedom. Furthermore, regardless of the values of a;, 8; and 7i;, Sesiq/7 Will have a x? distribution 
with I J(K — 1) degrees of freedom, and $4 + $3, + $?, and S?..;4 will be independent. Hence, under 
Ho, the statistic 


py — LIK = (54 + Si + Sint) 

7 (IJ ~~ 1) SResia 

will have the F distribution with JJ — 1 and I J(k — 1) degrees of freedom. The null hypothesis Ho 
should be rejected if U > c. 


The design in this exercise is a two-way layout with two levels of each factor and K observations in 
each cell. The hypothesis Hp is precisely the hypothesis Hp given in (11.8.11) that the effects of the 
two factors are additive and all interactions are 0. Hence, Ho should be rejected if U4, > c, where 
U4, is given by (11.8.12) with J = J = 2, and U3, has an F distribution with 1 and 4(K — 1) degrees 
of freedom when Hp is true. 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


15. 


16. 


Section 11.9. Supplementary Exercises 393 


Let Y; = W1, Yo = W2 — 5, and Y3 = 5 Ws. Then the random vector 


Y, 
Yo 
Y3 


a— 


satisfies the conditions of the general linear model as described in Sec. 11.5 with 


1-1 6 
Z=/1 |: p=-|;|- 
1. I . 
Thus, 
i 18: iat. | BIS: S18 
z'z=|% lk (2 =|. 3/3 |? 
and 
1 1 1 
P 6, qt yeta. 
a= |g] =@izyiz’y = a 
: qi tge=5% 
Also, 
a2 1 aA\I A 
= (¥-2By(v-ZB) 
1 . - & ee 
= alee 0, — 02)" + (Yo — 4 Go)? + (¥5 — 61 + 02)? 


The following distributional properties of these M.L.E.’s are known from Sec. 11.5: (61, 62) and 6? 
are independent; (6), 62) has a bivariate normal distribution with mean vector (91,92) and covariance 
matrix 


ta \-1 __ [ 3/8 —1/8] 9. 
ozzy =| 3 ee 


367/07 has a x? distribution with one degree of freedom. 


Direct application of the theory of least squares would require choosing a and (6 to minimize 


This minimization must be carried out numerically since the solution cannot be found in closed form. 
However, if we express the required curve in the form log y = log a+ (log x, and then apply the method 
of least squares, we must choose 69 and 5; to minimize 
nm 
Q2 = > (log yi — Bo — Ar log x;)?, 


i=1 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


394 Chapter 11. Linear Statistical Models 


where 69 = loga@ and 6; = GB. The least squares estimates Bo and By can now be found as in Sec. 11.1, 
based on the values of log y; and log x;. Estimates of a and @ can then be obtained from the relations 
log & = By and 8 = f;. It should be emphasized that these values will not be the same as the least 
squares estimates found by minimizing Q, directly. 


The appropriateness of each of these methods depends on the appropriateness of minimizing Q; and Qo. 
The first method is appropriate if Y; = ax! + ¢€;, where ¢; has a normal distribution with mean 0 and 
variance o*. The second method is appropriate if Y; = ax? €;, where loge; has the normal distribution 
just described. 


17. It follows from the expressions for $9 and }; given by Eqs. (11.1.1) and (11.2.7) that 


eg = Y¥i- , _ En 61) _ 312; 


= Y; = y, a _ 
8 
x 
- ¥, c 1 = 
S7 
-\-Y; = 4 (ai — Fn) ay = aa 
=e n 8 
JAA ad 
where s2 = Be re En)”. Since Y,...,Yp are independent and each has variance o, it follows that 


1 (x; = ae 


n s 


“ot [2 4 Gea tales = to)" 


2 
jx Sx 


Let Q; = ++ oon, Then 


Var(e;) = o%(1-Q)?+025> E n (i= So) (Gs ~ #0) age 
i i = a 52 z 
= o((1-Qi)? + Qi - Qi] 
o*(1 = (2; 
(This result could also have been obtained from the more general result to be obtained next in Exer- 


cise 18.) Since Q; is an increasing function of (x; — Z,)°, it follows that Var(e;) is a decreasing function 
of (xj — Z,)? and, hence, of the distance between x; and Zp. 


18. (a) Since @ has the form given in Eq. (11.5.10), it follows directly that Y — ZB has the specified 
form. 


(b) Let A= Z(Z'Z)+Z’. It can be verified directly that A is idempotent, i.c., AA = A. Since 
D=TI-—A, it now follows that 


DD =(I—A)(I— A)=II— AI—-IA+ AA=I—-A-—A+A=I-A=D. 
(c) As stated in Eq. (11.5.15), Cov(Y) = 0? I. Hence, by Theorem 11.5.2, Cov(W ) = Cov(DY ) = 
D Cov(Y)D! = D(o7I)D = 0?(DD) =0°7D. 
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Let 6 = 1_, 16;/v4 and > = Si w;W;/w+, and define p = 6+ 4,a; = 6; — 9, and 8; = yj — vy. 
Then £(¥,;) = 6+; = +044 @; and 5 VjiQ4 = eT w;; = 0. To establish uniqueness, suppose 
that py’, a‘, and 8; are another set of values satisfying the required conditions. Then 


pot Bj =p +a; + 6; forall « and j. 


If we multiply both sides by v;jw; and sum over i and j, we find that = yp’. Hence, a; + 6; = a+ 6}. 
If we now multiply both sides by v; and sum over 7, we find that 8; = B- Similarly, if we multiply both 
sides by w; and sum over j, we find that a; = aj. 


The value of jz,a;, and 8; must be chosen to minimize 


J Kis 
O=) ka. 
i=1 j=1k=1 


The equation 0Q/Ou = 0 reduces to 


I J 
Yy44—np— >> Kita; — >) K458; =0, 
i=1 j=l 
where n = K++ is the total number of observations in the two-way layout. Next we shall calculate a. 
for? =1,...,f—1, keeping in mind that SS Kj,0; = 0. Hence, 0a;/d0a; = —Kj,/K74. It can be 


found that the equation 0Q/0a; = 0 reduces to the following equation for i = 1,...,/—1: 


Kya 
Yie4 — Aap — Kigagy— \ Ki Bj = Kc uit+ =Kiep = Apo = os K778;). 
J T J 


In other words, the following quantity must have the same value for 7 = 1,..., J: 


1 
K (re —- Kiyp- Kau —- >> Ks) 

i+ j 
Similarly, the set of equations 0Q/06; = 0 for 7 = 1,...,J — 1 reduces to the requirement that the 
following quantity have the same value for 7 = 1,..., J: 


1 
he tas — Ky jp -— >> Kyou - K430;) ; 
i 


It can be verified by direct substitution that the values of 1, @;, and 8; given in the exercise satisfy all 
these requirements and, hence, are the least squares estimators. 


21. As in the solution of Exercise 18 of Sec. 11.8, let 


m = Sy est aks 


Tisyt 

aA = S ArstY rst, 
T,8,t 

B; = 5 brstYrst- 
T8,6 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


396 


22. 


23. 


24. 
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To show that Cov(ji, @;) = 0, we must show that are Mrst@rst = 0. But Mrst = 4 for all r, s, t, and 


1 1 f 
—— or r=41, 
_)Kiz n 
arst = 1 
-—= for r At. 
n 


Hence, it is found that x MrstArst = 0. Similarly, 


1 1 
ae = for ss = a; 
br st — : ne 
-—— for SA J. 
nm 


and a Mrstorst = 0, SO Cov(ji, B;) = 0. 
We must show that > aps¢bps¢ = 0, where a,s; and b,<¢ are given in the solution of Exercise 21: 


Kaj Kis Ky; 
1 1 1 1 1 ¢ 4 i 1 1 
Dare = Ce (4 -2)-4E¥ (HR -2)-2E¥ (4-3) 


r,s,t k=1 sA#j k=1 rAi k=1 


1 
to LD Ks: 


rH s#j 


Since nj; = Kj4K+;, it can be verified that this sum is 0. 


Consider the expression for 6;;; given in this exercise. If we sum both sides of this expression over 2, 
j, and k, then it follows from the constraints on the a’s, 3’s, and 7s, that ~ = 0,44. If we substitute 
this value for jp and sum both sides of the expression over j and k, we can solve the result for af. 
Similarly, ay can be found by summing over i and k, and a’ by summing over 7 and j. After these 
values have been found, we can determine py? by summing both sides over k, and determine pee and 
pre similarly. Finally, 7; is determined by taking its value to be whatever is necessary to satisfy the 


required expression for 6;;,. In this way, we obtain the following values: 


B= V+44; 
aj = Gi44 —O444, 
ay = 6454-0444, 
OF = Gy brea, 
a = ij — Din — O45 + O444, 
BES = Gin — 9:44 —O44% + O444, 


pRe O45 — 945+ — 9446 + 0444, 


Dijk — 9ij+ — Oi4n — O45 + Digg + O44 + Oth — 44s. 


Vijk 


It can be verified that these quantities satisfy all the specified constraints. They are unique by the 
method of their construction, since they were derived as the only values that could possibly satisfy the 
constraints. 


(a) The plot of Buchanan vote against total county vote is in Fig. $.11.3. Palm Beach county is 
plotted with the symbol P. 
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Figure $.11.3: Figure for Exercise 24a in Sec. 11.9. 


(b) The summary of the fitted regression is Bo = 83.69, 8, = 0.00153, Z, = 8.254 x 104, = 
1.035 < 10", n= 67, and o = 120.1. 


(c) The plot of residuals is in Fig. 5.11.4. Notice that the residuals are much more spread out at the 


WE ye, 


Residual 


T T T T T T 
100000 200000 300000 400000 500000 600000 


° 


Total County Vote 


Figure $.11.4: Figure for Exercise 24c in Sec. 11.9. 


right side of the plot than at the left. There also appears to be a bit of a curve to the plot. 
(d) The summary of the fitted regression is Bo = —2.746, 6, = 0.7263, Z,, = 10.32, s2 = 151.5, 07 = 67, 
and o’ = 0.4647. 


(ce) The new residual plot is in Fig. $.11.5. The spread is much more uniform from right to left and 
the curve is no longer evident. 


(f) The quantile we need is Tgs'(0.995) = 2.654. The logarithm of total vote for Palm Beach county 
is 12.98. prediction interval for logarithm of Buchanan vote when X = 12.98 is 


1 (12.98 — 10.32)2\ "7 
67 151.5 


—2.746 + 12.98 x 0.7263 + 2.654 x 0.4647 ( y= ae 


= (5.419, 7.942]. 
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Figure $.11.5: Figure for Exercise 24e in Sec. 11.9. 


Converting to Buchanan vote, we take e to the power of each endpoint and get the interval 
(225.6, 2812]. 


(g) The official Gore total was 2912253, while the official Bush total was 2912790. Suppose that 2812 
people in Palm Beach county had actually voted for Buchanan and the other 3411 — 2812 = 599 
had voted for Gore. Then the Gore total would have been 2912852, enough to make him the 
winner. 
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Simulation 


All exercises that involve simulation will produce different answers when run with repeatedly. Hence, one 
cannot expect numerical results to match perfectly with any answers given here. 


For all exercises that require simulation, students will need access to software that will do some of the 
work for them. At a minimum they will need software to simulate uniform pseudo-random numbers on the 
interval [0, 1]. Some of the exercises require software to simulate all of the famous distributions and compute 
the c.d.f.’s and quantile functions of the famous distributions. 


Some of the simulations require a nonnegligible programming effort. In particular, Markov chain Monte 
Carlo (MCMC) requires looping through all of the coordinates inside of the iteration loop. Assessing conver- 
gence and simulation standard error for a MCMC result requires running several chains in parallel. Students 
who do not have a lot of programming experience might need some help with these exercises. 


If one is using the software R, the function runif will return uniform pseudo-random numbers, the 
argument is how many you want. For other named distributions, as mentioned earlier in this manual, one 
can use the functions rbinom, rhyper, rpois, rnbinom, rgeom, rnorm, rlnorm, rgamma, rexp, rbeta, and 
rmultinom. 


Most simulations require calculation of averages and sample variances. The functions mean, median, and 
var compute the average, sample median, and sample variance respectively of their first argument. Each of 
these has an optional argument na.rm, which can be set either to TRUE or to FALSE (the default). If true, 
na.rm causes missing values to be ignored in the calculation. Missing values in simulations should be rare if 
calculations are being done correctly. Other useful functions for simulations are sort and sort.list. They 
both take a vector argument. The first returns its argument sorted algebraically from smallest to largest (or 
largest to smallest with optional argument decreasing=TRUE.) The second returns a list of integers giving the 
locations (in the vector) of the ordered values of its argument. The functions min and max give the smallest 
and largest values of their argument. 


For looping, one can use 
for(i in 1:n){ ... } 
to perform all of the functions between { and } once for each value of i from 1 to n. For an indeterminate 
number of iterations, one can use 
while(expression){ ... } 
where expression stands for a logical expression that changes value from TRUE to FALSE at some point during 
the iterations. 


A long series of examples appears at the end. 
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12.1 What is Simulation? 


Solutions to Exercises 


1. 


2. 


Simulate a large number of exponential random variables with paremeter 1, and take their average. 


We would expect that every so often one of the simulated random variables would be much larger than 
the others and the sample average would go up significantly when that random variable got included 
in the average. The more simulations we do, the more such large observations we would expect, and 
the average should keep getting larger. 


. We would expect to get a lot of very large positive observations and a lot of very large negative 


observations. Each time we got one, the average would either jump up (when we get a positive one) 
or jump down (when we get a negative one). As we sampled more and more observations, the average 
should bounce up and down quite a bit and never settle anywhere. 


. We could count how many Bernoulli’s we had to sample to get a success (1) and call that the first 


observation of a geometric random variable. Starting with the next Bernoulli, start counting again 
until the next 1, and call that the second geometric, etc. Average all the observed geometrics to 
approximate the mean. 


(a) Simulate three exponentials at a time. Call the sum of the first two X and call the third one Y. 
For each triple, record whether X < Y or not. The proportion of times that X < Y in a large 
sample of triples approximates Pr(X < Y). 


(b) Let 2, Z2, Z3 be i.i.d. having the exponential distribution with parameter (, and let W1, W2, W3 
be ii.d. having the exponential distribution with parameter 1. Then Z, + Z2 < Zs if and only if 
BZ, + BZ2 < BZ3. But (821, 8Z2,6Z3) has precisely the same joint distribution as (W1, W2, W3). 
So, the probability that 2, + Z2. < Z3 is the same as the probability that W, + W2 < Ws, and 
it doesn’t matter which parameter we use for the exponential distribution. All simulations will 
approximate the same quantity as we would approximate using parameter 1. 


(c) We know that X and Y are independent and that X has the gamma distribution with parameters 
2 and 0.4. The joint p.d-f. is 


f(a,y) = 0.422 exp(—0.42)0.4 exp(—0.4y), for x,y > 0. 
The integral to compute the probability is 
Prix <Y)= I i 0.4°2 exp(—0.4[ax + y])dydz. 
xz 
There is also a version with the x integral on the inside. 


oo ry 
Pry 2 y= | | 0.4°.x exp(—0.4[a + y])dxdy. 
0 Jo 


12.2 Why Is Simulation Useful? 


Commentary 


This section introduces the fundamental concepts of simulation and illustrates the basic calculations that 
underlie almost all simulations. Instructors should stress the need for assessing the variability in a simulation 
result. For complicated simulations, it can be difficult to assess variability, but students need to be aware 
that a highly variable simulation may be no better than an educated guess. 
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The lengthy examples (12.2.13 and 12.2.14) at the end of the section and the exercises (15 and 16) that 
go with them are mainly illustrative of the power of simulation. These would only be covered in course that 
devoted a lot of time to simulation. 


Solutions to Exercises 


1. Since E(Z) = p, the Cheybshev inequality says that Pr(|Z — y| < €) > e?/Var(Z). Since Z is the 
average of v independent random variables with variance o?, Var(Z) = 0?/v. It follows that 


e 


Pr(|Z — p| S €) 2 


Now, suppose that v > o?/[e?(1 — y), then 


2. In Example 12.2.11, we are approximating o by 0.3892. According to Eq. (12.2.6), we need 


0.3892? 


v > — — = 151476.64 
0.01? x .01 


So, v must be at least 151477. 


3. We could simulate a lot (say vo) standard normal random variables Wj,...,W,, and let X; = 7W; +2. 
Then each X; has the distribution of X. Let W; = log(|X;| +1). We could then compute Z equal to 
the average of the W;’s as an estimate of E(log(|X| + 1)). If we needed our estimate to be close to 
E(log(|X|+1)) with high probability, we could estimate the variance of W; by the sample variance and 
then use (12.2.5) to choose a possibly larger simulation size. 


4. Simulate 15 random variables U;,...,Ui5 with uniform distribution on the interval [0,1]. For 7 = 
1,...,13, let X; = 2(U; — 0.5) and for i = 14,15, let X; = 20(U; — 0.5). Then Xj,..., X15 have the 
desired distribution. In most of my simulations, the median or the sample average was the closest to 
0. The first simulation led to the following six values: 


Trimmed mean 
Estimator | Average K=1 k=2 k=3 k=4 Median 


Estimate 0.5634 0.3641 0.2205 0.2235 0.2359 0.1836 


5. (a) In my ten samples, the sample median was closest to 0 nine times, and the k = 3 trimmed mean 
was closet to 0 one time. 


(b) Although the k = 2 trimmed mean was never closest to 0, it was also never very far from 0, and it 
had the smallest average squared distance from 0. The k = 3 trimmed mean was a close second. 
Here are the six values for my first 10 simulations: 


Trimmed mean 
Estimator | Average K=1 k=2 k=3 #k=4 Median 


M.S.E. 0.4425 0.1354 0.0425 0.0450 0.0509 0.0508 


These rankings were also reflected in a much larger simulation. 
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Simulate lots (say vo) of random variables Xj,...,Xy. and Yi,..., Yo) with X; have the beta 
distribution with parameters 3.5 and 2.7 while Y; have the beta distribution with parameters 1.8 
and 4.2. Let Z; = X;/(X; + Y;). The sample average of Z1,...,2Z,.) should be close to the mean 
of X/(X + Y) if vp is large enough. 


We could calculate the sample variance of Z,,...,2Z,, and use this as an estimate of a in 
Eq. (12.2.5) with y = 0.98 and € = 0.01 to obtain a new simulation size. 


The distribution of X is the contaminated normal distribution with p.d.f. given in Eq. (10.7.2) 
witho =1, u=0. 


To calculate a number in Table 10.40, we should simulate lots of samples of size 20 from the 
distribution in part (a) with the desired ¢€ (0.05 in this case). For each sample, compute the 
desired estimator (the sample median in this case). Then compute the average of the squares 
of the estimators (since 4 = 0 in our samples) and multiply by 20. As an example, we did two 
simulations of size 10000 each and got 1.617 and 1.621. 


The description is the same as in Exercise 7(b) with “sample median” replaced by “trimmed mean 
for k = 2” and 0.05 replaced by 0.1. 


We did two simulations of size 10000 each and got 2.041 and 2.088. It would appear that this 
simulation is slightly more variable than the one in Exercise 7. 


9. The marginal p.d.f. of X is 


10. 


(a) 
(b) 


(c) 


oo 1,3 
[Femme + dn = 


for x > 0. The c.d.f. of X is then 


Ft) = [Pade =1- (SH). 


for x > 0, and F(a) = 0 for x < 0. The median is that x such that F(x) = 1/2, which is easily seen to 
be 21/3 — 1 = 0.2599. 


The c.d.f. of each X; is F(x) = 1 — exp(—Az), for z > 0. The median is log(2)/. 


Let Y; = X;A, and let M’ be the sample median of Yj,..., Y21. Then the Y;’s have the exponential 
distribution with parameter 1, the median of Y; is log(2), and M’ = M.. The M.S.E. of M is then 


log(2)\? i ‘ 
E (0 - oa = 5zB[(MA— log(2))? 
|: 
= 5A [(M" — t0g(2))" 
0 
— \2" 
Simulate a lot (say 2lvo) of random variables X1,...,X211) having the exponential distribution 


with parameter 1. For 7 = 1,...,vo, let Mj; be the sample median of X91(j~1)41,---,Xa1i- Let 
12 
Y; = (M; — log(2))?, and compute the sample average Z = — S- Y; as an estimate of 6. If you 


9 j=1 
want to see how good an estimate it is, compute the simulation standard error. 
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11. In Example 12.2.4, juz and jy are independent with (Jt2—[21)/(B21/[ax1 daly having the ¢ distribution 
with 2,1 degrees of freedom and (ry — fy1)/(By1/[ayiAyi])/? having the t distribution with 2ay1 


degrees of freedom. We should simulate lots (say v) of t random variables fpen bed iia with 2a,1 
degrees of freedom and just as many ¢ random variables 7, oS oT with 2a,; degrees of freedom. 
Then let 
1/2 
® - 4.470 (—_) 
Me Merl x Ori Ar1 ’ 
1/2 
® — 4» 4p@ {Pa 
By My! ¥ (* Ayl 
for? =1,...,v. Then the values ue — uo form a sample from the posterior distribution of fz — [y. 


12. To the level of approximation in Eq. (12.2.7), we have 
Z= GEV), E(W)) + (EY), EW))Y — E(Y)| + 92(E(Y), EOW))[W - EW). 
The variance of Z would then be 


n(E(Y), E(W))? Var(Y) + g2(E(Y), E(W))? Var(W) (8.12.1) 
+29 (E(Y), E(W))92(E(Y), E(W)) Cov(Y, W). 


Now substitute the entries of © for the variances and covariance. 


13. The function g in this exercise is g(y, w) = w — y? with partial derivatives 


giy,w) = 2y, 
galy,w) = 1. 


In the formula for Var(Z) given in Exercise 12, make the following substitutions: 


Exercise 12 | This exercise 


where Z, V, and C are defined in Example 12.2.10. The result is [(2Y)?Z+V +4Y C]/v, which simplifies 
to (12.2.3). 


14. Let Yj,...,Y, be a large sample from the distribution of Y. Let Y be the sample average, and let V 
be the sample variance. For each i, define W; = (Y; — Y)?/V. Estimate the skewness by the sample 
average of the W;’s. Use the sample variance to compute a simulation standard error to see if the 
simulation size is large enough. 


15. (a) Since S, = So exp(au + W,,), we have that 
E(Su) = So exp(au)E (exp(Wu)) = So exp(au)p(1). 
In order for this mean to be So exp(ru), it is necessary and sufficient that w(1) = exp(u[r — a]), 


or equivalently, a = r — log(w(1))/u. 
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(b) First, simulate lots (say v) of random variables W,..., W) with the distribution of W,,. Define 
the function h(s) as in Example 12.2.13. Define Y = exp(—ru)h(Sp exp[au + W]), where r is 
the risk free interest rate and a is the number found in part (a). The sample average of the ys 
would estimate the appropriate price for the option. One should compute a simulation standard 
error to see if the simulation size is large enough. 


16. We can model our solution on Example 12.2.14. We should simulate a large number of operations of 


the queue up to time ¢. For each simulated operation of the queue, count how many customers are 
in the queue (including any being served). In order to simulation one instance of the queue operation 
up to time t, we can proceed as follows. Simulate interarrival times X71, X2,... as exponential random 


j 
variables with parameter \. Define T; = so T; for 7 = 1,2,.... Stop simulating at the first & such that 


i=1 
T; >t. Start the queue with Wo = 0, where W; stands for the time that customer 7 leaves the queue. 
In what follows, S; € {1,2} will stand for which server serves customer j, and Z; will stand for the 
time at which customer 7 begins being served. 


For j =1,...,k —1 and each 7 < j, define 


L at W; 27; 
U; j= : 
; 0 otherwise. 
j-l 
The number of customers in the queue when customer j arrives is r = S- Uj, 3. 
i=0 


e Ifr =0, simulate U with a uniform distribution on the interval [0,1]. Set S; = 1 if U < 1/2 and 
S; = 2 ifU > 1/2. Set LZ; = Tj. 

e If r = 1, find the value 7 such that W; > Tj and set S; = 2 — S; so that customer j goes to the 
other server. Set 2; = Tj. 

e If r > 2, simulate U with a uniform distribution on the interval [0,1], and let customer j leave if 
U <p,. If customer j leaves, set W; = T;. If customer j does not leave, find the second highest 
value Wj out of W,,...,W;-1 and set S; = Sy and Z; = Wy. 


For each customer that does not leave, simulate a service time Y; having an exponential distribution 
with parameter yg,, and set W; = Z; + Y;. The number of customers in the queue at time t is the 
number of j € {1,...,4 — 1} such that W; > t. 


12.3 Simulating Specific Distributions 


Commentary 


This section is primarily of mathematical interest. Most distributions with which students are familiar can be 
simulated directly with existing statistical software. Instructors who wish to steer away from the theoretical 
side of simulation should look over the examples before skipping this section in case they contain some points 
that they would like to make. For example, a method is given for computing simulation standard error when 
the simulation result is an entire sample c.d.f. (see page 811). This relies on results from Sec. 10.6. 


Solutions to Exercises 


1. 


(a) Here we are being asked to perform the simulation outlined in the solution to Exercise 10 in 
Sec. 12.2 with vo = 2000 simulations. Each Y; (in the notation of that solution) can be simulated 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


2. 


6. 


Section 12.3. Simulating Specific Distributions 405 


by taking a random variable U; having uniform distribution on the interval [0,1] and setting 
Y; = —log(1 — U;). In addition to the run whose answers are in the back of the text, here are 
the results of two additional simulations: Approximation = 0.0536, sim. std. err. = 0.0023 and 
Approximation = 0.0492, sim. std. err. = 0.0019. 


(b) For the two additional simulations in part (a), the value of v to achieve the desired goal are 706 
and 459. 


Let Vj =a+(b—a)U;. Then the p.d-f. of V; is easily seen to be 1/(b — a) for a< uv < b, so V; has the 
desired uniform distribution. 


. The c.d.f. corresponding to gj is 


a | 
Gi(a) = | saat = 2'”?, forO<a< 1. 


The quantile function is then Gj'(p) = p? for 0 < p < 1. To simulate a random variable with the 
p.df. gi, simulate U with a uniform distribution on the interval [0,1] and let X = U?. The c.d-f. 
corresponding to go is 


2 1 
Ga(a) = [ mapa 1 0-a)”, forO<a< 1. 


The quantile function is then Gy'(p) = 1—(1—>p)? for 0 < p <1. To simulate a random variable with 
the p.d.f. go, simulate U with a uniform distribution on the interval [0,1] and let X = 1—(1—U)?. 


. The c.d.f. of a Cauchy random variable is 


Fe)\= I. a = = arctan(a) + | ; 


The quantile function is F~'(p) = tan(z[p — 1/2]). So, if U has a uniform distribution on the interval 
[0, 1], then tan(a[U — 1/2]) has a Cauchy distribution. 


. The probability of acceptance on each attempt is 1/k. Since the attempts (trials) are independent, the 


number of failures X until the first acceptance is a geometric random variable with parameter 1/k. The 
number of iterations until the first acceptance is X + 1. The mean of X is (1 — 1/k)/(1/k) =k —1, so 
the mean of X + 1 is k. 

(a) The c.d.f. of the Laplace distribution is 


1 
5 exp(x) ite < 0, 


ri 
P(x) = [5 exp(-|e)dt = i 
= 1- 5 exp(—x) ifa>0. 


The quantile function is then 


1, _ f log(2p) if0<p<1/2, 
F “=| Tool —p) if 1p pet. 


Simulate a uniform random variable U on the interval [0,1] and let X = F~'(U). 
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(b) Define 


fle) = G_ew(-2"/2), 


g(z) = Sexp(-lal). 


We need to find a constant k such that kg(x) > f(x) for all x. Equivalently, we need a constant c 
such that 
exp(—x?/2) 
~ exp(—|al) ’ 
for all x. Then we can set k = c(2/7)!/?. The smallest ¢ that satisfies (S.12.2) is the supremum 
of exp(|z| — 27/2). This function is symmetric around 0, so we can look for uP exp(x — 27/2). To 
x 


(S.12.2) 


maximize this, we can maximize x — 27/2 instead. The maximum of x — x?/ 2 occurs at 2 = 1, so 
c = exp(1/2). Now, use acceptance/rejection with k = exp(1/2)(2/m)!/? = 1.315. 


4 
Simulate a random sample Xj,...,X 1, from the standard normal distribution. Then S> X? has the 
i=l 
ial ‘ 
x? distribution with 4 degrees of freedom and is independent of Dae which has the x? distribution 
i=5 


with 7 degrees of freedom. It follows that 
4 
7))X? 
i=1 
11 
4° X? 
i=5 


the F' distribution with 4 and 7 degrees of freedom. 


(a) I did five simulations of the type requested and got the estimates 1.325, 1.385, 1.369, 1.306, and 
1.329. There seems to be quite a bit of variability if we want three significant digits. 


(b) The five variance estimates were 1.333, 1.260, 1.217, 1.366, and 1.200. 


(c) The required sample sizes varied from 81000 to 91000, suggesting that we do not yet have a very 
precise estimate. 


. The simplest acceptance/rejection algorithm would use a uniform distribution on the interval [0, 2]. 


That is, let g(x) = 0.5 for 0 < « < 2. Then (4/3)g(x) > f(x) for all x, ie. k = 4/3. We could simulate 
U and V both having a uniform distribution on the interval [0,1]. Then let X = 2V if 2f(2V) > (4/3)U 
and reject otherwise. 


Using the prior distribution stated in the exercise, the posterior distributions for the probabilities of no 
relapse in the four treatment groups are 


Beta with parameters 


Group a B 
Imipramine 23 19 
Lithium 26 15 
Combination | 17 22 
Placebo 11 25 
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We then simulate 5000 vectors of four beta random variables with the above parameters. Then we 
see what proportion of those 5000 vectors have the imipramine parameter the largest. We did five 
such simulations and got the proportions 0.1598, 0.1626, 0.1668, 0.1650, and 0.1650. The sample sizes 
required to achieve the desired accuracy are all around 5300. 


11. The x? distribution with m degrees of freedom is the same as the gamma distribution with parameters 
m/2 and 1/2. So, we should simulate Y® having the x? distribution with n — p degrees of freedom 
and sehr) =Y) /S2 43. 


12. We did a simulation of size v = 2000. 


(a) The plot of the sample c.d.f. of the |e? _- us| values is in Fig. $.12.1. 


Sample d.f. 


T T T T T T 
Oo 5 10 15 20 25 


Absolute 
difference between group means 


Figure $.12.1: Sample c.d.f. of |? — pO values for Exercise 12a in Sec. 12.3. 


(b) The histogram of the ratios of calcium supplement precision to placebo precision is given in 
Fig. $.12.2. Only 12% of the simulated log (7? /7$) were positive and 37% were less than —1. 


2 

Ss 4 

+ 

2 

8+ 

2° 

Se 4 

a 

a i 

sy 

aw — = 
r T T T T 
3 2 al (0) 1 


Log-ratio of precisions 


Count 


Figure $.12.2: Histogram of log (76/7) values for Exercise 12b in Sec. 12.3. 


There seems to be a sizeable probability that the two precisions (hence the variances) are unequal. 
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Let X = F~1!(U), where F~! is defined in Eq. (12.3.7) and U has a uniform distribution on the interval 
[0,1]. Let G be the c.d.f. of X. We need to show that G = F, where F is defined in Eq. (12.3.6). Since 
F~! only takes the values t1,...,tn, it follows that G has jumps at those values and if flat everywhere 
else. Since F' also has jumps at t,,...,¢, and is flat everywhere else, we only need to show that 
F(x) = G(a) for x € {ti,...,tn}. Let gq, =1. Then X < ¢, if and only if U < q; fori =1,...,n. Since 
Pri < g;) =a, it follows that GW) =q fori = 1,...,n. That is, G(2) = F(a) tora € {t1,... ,tg}. 


First, establish the Bonferroni inequality. Let A,,...,A, be events. Then 
k k k k 
Pr (n A.) =1-—Pr (U as) >1- S > Pr( Af) =1- pa — Pr(A;)]. 
i=1 i=1 i=1 i=1 
Now, let k = 3 and 
A; = {|Gu a(x) — Gi(x)| < 0.0082, for all x}, 
for i = 1,2,3. The event stated in the exercise is N3_,A;. According to the arguments in Sec. 10.6, 
Pr (60000"/?|Gy,¢(a) — G(a)| < 2, for all x) ~ 0.9993. 
Since 2/60000!/2 = 0.0082, we have Pr(A;) * 0.9993 for i = 1,2,3. The Bonferroni inequality then 
says that Pr(n3_,A;) © 0.9979 or more. 


The proof is exactly what the hint says. All joint p.d.f.’s should be considered joint p.f./p.d.f.’s and 
the p.d.f.’s of X and Y should be considered p.f.’s instead. The only integral over x in the proof is in 
the second displayed equation in the proof. The outer integral in that equation should be replaced by 
a sum over all possible x values. The rest of the proof is identical to the proof of Theorem 12.3.1. 


k 
Let p; = exp(—0)0*/(i!) for i = 0,1,... and let q, = ye Let U have a uniform distribution on the 


i=1 
interval [0,1]. Let Y be the smallest k such that U < qx. Then Y has a Poisson distribution with mean 
0. 


Let {x1,...,2%m} be the set of values that have positive probability under at least one of g1,...,9n- 
That is, for each 7 = 1,...,m there is at least one 7 such that g;(v;) > 0 and for each i = 1,...,n, 


> gi(zj) = 1. Then, the law of total probability says that 
j=l 


Pr(X =2))= >> = gi (2). (8.12.3) 


Since 271,...,2m are the only values that X can take, Eq. (S.12.3) specifies the entire p.f. of X and we 
see that Eq. (S.12.3) is the same as Eq. (12.3.8). 
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The Poisson probabilities with mean 5 from the table in the text are 


0 1 2 3 4 5 6 7 8 
0067 =.0337 =.0842) 1404 1755) 1755) £1462) .1044 0653 


0363 .0181 .0082 .0034 .0013) .0005 = .0002 = .0001 


where we have put the remainder probability under x = 16. In this case we have n = 17 different possible 
values. Since 1/17 = .0588, we can use 2; = 0 and y; = 2. Then gi (0) = .1139 and gi (2) = .8861. 
Then f[(2) = .8042 — (1 — .1139)/17 = .0321. Next, take x2 = 1 and yo = 3. Then go(1) = .5729 
and go(3) = .4271. This makes f3(3) = .1153. Next, take 73 = 2 and ys = 3 so that g3(2) = .5453, 
g3(3) = .4547, and f3(3) = .0885. Next, take x4 = 9 and y, = 3, etc. The result of 16 such iterations 
is summarized in Table §.12.1. 


Table S.12.1: Result of alias method in Exercise 18 of Sec. 12.3 


i |e oles) | ve ots) 4 | x gilts) | Yi gilyi) 
1 0 1139] 2 .8861 |} 10} 13 .0221 |] 5 .9779 
2 1.5729} 3 .4271 |} 11] 14 .0085 | 5 .9915 
3 2 5453 | 3 .4547 || 12 5 6246] 6 .3754 
4 9 6171 | 3 .3829 |} 13} 15 .0034] 6 .9966 
5] 10 .38077 |) 3 .6923 || 14] 16 .0O17] 6 .9983 
6 3. .4298 | 4. .5702 || 15 6 1151 | 7 .8849 
7) 11 .1394} 4. .8606 |} 16 7 .8899 | 8 .1101 
8] 12 0578 | 4 .9422 || 17 8 1 

9 4 6105] 5 .3895 


The alias method is not unique. For example, we could have started with 7, = 1 and y, = 3 or many 
other possible combinations. Each choice would lead to a different version of Table 5.12.1. 


For k =1,...,n, =k if and only ifk <nY +1<k+1. Hence 


“—t<y<*)=- 
n n 


Pri =k)= Pr( 


n 


The conditional c.d.f. of U given I = k is 


Pr(U <t\f=k) = Pr(nY +1-I <#|I =k) 
Pr(nY +1-—k<t,I=k) 


Pri =k) 
pr(y < 81 cy <*) 
_ n n n 
1/n 
— k-1 
= nPr( Eee e ) 
n 


for 0 <t <1. So, the conditional distribution of U given J = k is uniform on the interval [0, 1] for all 
k;. Since the conditional distribution is the same for all k, U and I are independent and the marginal 
distribution of U is uniform on the interval [0, 1]. 
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12.4 Importance Sampling 


Commentary 


Users of importance sampling might forget to check whether the importance function leads to a finite variance 
estimator. If the ratio of the function being integrated to the importance function is not bounded, one might 
have an infinite variance estimator. This doesn’t happen in the examples in the text, but students should 
be made aware of the possibility. This section ends with an introduction to stratified importance sampling. 
This is an advanced topic that is quite useful, but might be skipped in a first pass. The last five exercises in 
this section introduce two additional variance reduction techniques, control variates and antithetic variates. 
These can be useful in many types of simulation problems, but those problems can be difficult to identify. 


Solutions to Exercises 


1. 


3. 


b 
We want to approximate the integral / g(x)dz. Suppose that we use importance sampling with f 


being the p.d.f. of the uniform distribution on the interval [a,b]. Then g(x)/f(x) = (6b — a)g(x) for 
a<az<b. Now, (12.4.1) is the same as (12.4.2). 


. First, we shall describe the second method in the exercise. We wish to approximate the integral 


/ g(x) f(a)dx using importance sampling with importance function f. We should then simulate values 


X with p.d.f. f and compute 


XM) F(X 
y@— eee te = 9X). 


The importance sampling estimate is the average of the Y values. Notice that this is precisely the 
same as the first method in the exercise. 


(a) This is a distribution for which the quantile function is easy to compute. The c.d-f. is F(x) = 
1 —(c/x)"/? for x > c, so the quantile function is F~!(p) = c/(1 — p)?/". So, simulate U having a 
uniform distribution on the interval [0,1] and let X = ¢/(1 — U)?/". Then X has the p.d.f. f. 

(b) Let 


i [tm + n)| mmlanrl? 


r (am) (an) 
2 2 
Then the p.d.f. of Y is g(x) = ax’"/2)-!/(ma + n)("™/?, for « > 0. Hence, 
oo g(m/2)—1 

Pry > c) = i ae me 
We could approximate this by sampling lots of values X with the p.d.f. f from part (a) and 
then averaging the values g(X)/f(X). 
The ratio g(z)/f(x) is, for x > c, 

g(x) 7 ag (mtn) /2 7 a 

f(x) c®/2(n/2)(ma + n)rtn)/2 ~~ r/2(n/2)(m + n/ax)(rtn)/2° 
This function is fairly flat for large x. Since we are only interested in x > c in this exercise, 
importance sampling will have us averaging random variables g(X)/f(X) that are nearly 
constant, hence the average should have small variance. 


“—~ 
io) 
NS 
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(a) If our 10000 exponentials are X ().., , X(10009) then our approximation is the average of the values 
log(1 +X). In two example simulations, I got averages of 0.5960 and 0.5952 with simulation 
standard errors of 0.0042 both times. 


(b) Using importance sampling with the importance function being the gamma p.d.f. with parameters 
1.5 and 1, I got estimates of 0.5965 and 0.5971 with simulation standard errors of 0.0012 both 
times. 


(c) The reason that the simulations in part (b) have smaller simulation standard error is that gamma 
importance function is a constant times x!/? exp(—2z). The ratio of the integrand to the importance 
function is a constant times log(1 + 2)a~!/?, which is nearly constant itself. 


Let U have a uniform distribution on the interval [0,1], and let W be defined by Eq. (12.4.6). The 
inverse transformation is 


(“ _ He) 

ee ee O72 

6 (2 a 
02 


The derivative of the inverse transformation is 


uu. = 


1 1 


(2n)'/20,0 (2—H2) = (-s 3 7 H2)*) (8.12.4) 


Since the p.d.f. of U is constant, the p.d.f. of W is (S.12.4), which is the same as (12.4.5). 


(a) We can simulate truncated normals as follows. If U has a uniform distribution on the interval 
(0, 1], then X = ®—!(6(1) + U[1 — 6(1)]) has the truncated normal distribution in the exercise. If 
X,..,, X09) are our simulated values, then the estimate is the average of the (1 — ®(1))X? 
values. Three simulations of size 1000 each produced the estimates 0.4095, 0.3878, and 0.4060. 


(b) If Y has an exponential distribution with parameter 0.5, and X = (1+Y)!/?, then we can find 
the p.d.f. of X. The inverse transformation is y = 2? — 1 with derivative 27. The p.d.f. of X is 
then 270.5 exp(—0.5a? + 0.5). If xX)... X (4000) are our simulated values, then the estimate is 
the average of the X exp(—0.5)/(27)!/? values. Three simulations of size 1000 each produced 
the estimates 0.3967, 0.3980, and 0.4016. 


(c) The simulation standard errors of the simulations in part (a) were close to 0.008, while those from 
part (b) were about 0.004, half as larger. The reason is that the random variables averaged in 
part (b) are closer to constant than those in part (a) since « is closer to constant than 2?. 

(a) We can simulate bivariate normals by simulating one of the marginals first and then simulating the 
second coordinate conditional on the first one. For example, if we simulate X. () U as independent 
normal random variables with mean 0 and variance 1, we can simulate x = 0.5.X () UO 26, 
Three simulations of size 10000 each produced estimates of 0.8285, 0.8308, and 0.8316 with simu- 
lation standard errors of 0.0037 each time. 


(b) Using the method of Example 12.4.3, we did three simulations of size 10000 each and got estimates 
of 0.8386, 0.8387, and 0.8386 with simulation standard errors of about 3.4 x 107°, about 0.01 as 
large as those from part (a). Also, notice how much closer the three simulations are in part (b) 
compared to the three in part (a). 
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8. The random variables that are averaged to compute the importance sampling estimator are Y = 
g(X)/f(X where the X’s have the p.d.f. f. Since g/f is bounded, Y has finite variance. 


9. The inverse transformation is v = F(x) with derivative f(x). So, the p.d.f. of X is f(x)/(b — a) for 
those 2 that can arise as values of F~1!(V), namely F~!(a) < x < F~1(b). 


10. For part (a), the stratified importance samples can be found by replacing U in the formula used in 
Exercise 6(a) by a + U(b — a) where (a,b) is one of the pairs (0,.2), (.2,.4), (.4,.6), (.6,.8), or (.8, 1). 
For part (b), replace Y by — log(1 — [a + U(b — a)]) in the formula X = (1+ Y)!/? using the same five 
(a,b) pairs. Three simulations using five intervals with 200 samples each produced estimates of 0.4018, 
0.4029, and 0.2963 in part (a) and 0.4022, 0.4016, and 0.4012 in part (b). The simulation standard 
errors were about 0.0016 in part (a) and 0.0006 in part (b). Both parts have simulation standard errors 
about 1/5 or 1/6 the size of those in Exercise 6. 


11. Since the conditional p.d.f. of X* given J = j is fj, the marginal p.d.f. of X™* is 
k 1 k 
f(a) = >7 file) Pr = 9) = 2 il). 
j=l j=l 


Since f;(z) = kf(x) for qj-1 < « < qj, for each x there is one and only one f;(x) > 0. Hence, 
F* (a) = f(a) for all 2. 


12. (a) The m.g.f. of a Laplace distribution with parameters 0 and a is 
oo 1 
W(t) = f expltz) = exp(—|el/o)de. 
= oO 
The integral from —oco to 0 is finite if and only if t > —1/o. The integral from 0 to oo is finite 
if and only if t < 1/o. So the integral is finite if and only if —1/0 < t < 1/o. The value of the 
integral is 
1 1 i f.. 4 
Qa |t+1/o -t+1/o] 1-0? 
Plugging o? = u/100 into this gives the expression in the exercise. 

(b) With u = 1, ¥(1) = 1/0.99. With r = 0.06, we get a = 0.06 + log(0.99) = 0.04995. We ran 
three simulations of size 100000 each using the method described in the solution to Exercise 15 in 
Sec. 12.2. The estimated prices were So times 0.0844, 0.0838, and 0.0843. The simulation standard 
errors were all about 3.659 x 1074. 


(c) S, > So if and only if W,, > —au, in this case au = 0.04995. The conditional c.d.f. of W,, given 
that W,, > —0.04995 is 


0.5[exp(10w) — 0.6068] if —0.04995 < w <0, 


F(w) = 1.1356 1 — 0.5[exp(—10w) + 0.6068] if w > 0. 


The quantile function is then 


F-lp) = 0.1 log(1.3931p + 0.6068)) if 0 <p < 0.2822, 
~ ) —0.1log(2[1 — 0.6966p] — 0.6068) if 0.2822 <p <1. 


When we use samples from this conditional distribution, we need to divide the average by 1.4356, 
which is the ratio of the conditional p.d.f. to the unconditional p.d.f. We ran three more simulations 
of size 100000 each and got estimates of So times 0.0845, 0.0846, and 0.0840 with simulation 
standard errors of about 2.66.59 x 10-4. The simulation standard error is only a little smaller than 
it was in part (b). 
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E(Y) + ke = E(W) — kE(V) + ke. By the usual importance sampling argument, 
w) =f z)dx and E(V) =c, so E(Z) = f g(x)dx 


ae 


b) Var(Z oe, + ko = —2kpowoy|. This is a quadratic in k that is minimized when k = pow/oy. 
ow Pp 


1 
14. (a) We know that | (1+ 2?)~'da = 1/4. We shall use f(x) = exp(—2x)/(1 — exp(—1)) for0 <a <1. 


16. 


17. 


0 
We shall simulate X“),..., X (909) with this p.d.f. and compute 


1 — exp(—1) 
(i) ame 
Me 7 Oo 
yO = exp[X ](1 — exp(—1)) 
7 14+ x2 


We ran three simulations of 10000 each and got estimates of the integral equal to 0.5248, 0.5262, 
and 0.5244 with simulation standard errors around 0.00135. This compares to 0.00097 in Exam- 
ple 12.4.1. We shall see what went wrong in part (b). 


(b) We use the samples in our simulation to estimate ow at 0.0964, oz at 0.0710, and p at —0.8683. 
Since the correlation appears to be negative, we should have used a negative value of k to multiply 
our control variate. Based on our estimates, we might use k = —1.1789. Additional simulations 
using this value of k produce simulation standard errors around 4.8 x 107‘. 


(a) Since U and 1—U both have uniform distributions on the interval [0,1], X® = F-!(U™) 
and T = F-1(1 —U™) have the same distribution. 


(b) Since X and T® have the same distribution, so do W and V™, so the means of W and V 


are both the same and they are both Jo x)dx, according to the importance sampling argument. 


(c) Since F~! is a monotone increasing function, we know that X and T are decreasing functions 
of each other. If g(x)/f(x) is monotone, then W and V© will also be decreasing functions of 
each other. As such they ought to be negatively correlated since one is small when the other is 
large. 


(d) Var(Z) = Var(Y)/v, and 
Var(Y) = 0.25[Var(W™) + Var(V) + 2Cov(W, V)] = 0.5(1 + p) Var(W™). 


Without antithetic variates, we get a variance of Var(W)/[2u]. If p < 0, then 0.5(1 + p) < 0.5 
and Var(Z) is smaller than we get without antithetic variates. 


Using the method outlined in Exercise 15, we did three simulations of size 5000 each and got estimates 
of 0.5250, 0.5247, and 0.5251 with estimates of Var(Y)!/? of about 0.0238, approximately 1/4 of 63 
from Example 12.4.1. 


In Exercise 3(c), g(x)/f (a) is a monotone function of x, so antithetic variates should help. In Exercise4(b), 
we could use control variates with h(a) = exp(—x). In Exercises 6(a) and 6(b) the ratios g(x)/f(x) 
are monotone, so antithetic variates should help. Control variates with h(a) = x exp(—2?/2) could also 
help in Exercise 6(a). Exercise 10 involves the same function, so the same methods could also be used 
in the stratified versions. 
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12.5 Markov Chain Monte Carlo 


Commentary 


Markov chain Monte Carlo (MCMC) is primarily used to simulate parameters in a Bayesian analysis. Im- 
plementing Gibbs sampling in all but the simplest problems is generally a nontrivial programming task. 
Instructors should keep this in mind when assigning exercises. The less experience students have had with 
programming, the more help they will need in implementing Gibbs sampling. The theoretical justification 
given in the text relies on the material on Markov chains from Sec. 3.10, which might have been skipped 
earlier in the course. This material is not necessary for actually performing MCMC. 

If one is using the software R, there is no substitute for old-fashioned programming. (There is a package 
called BUGS: 
http://www.mrc-bsu.cam.ac.uk/bugs/ but I will not describe it here.) After the solutions, there is R code 
to do the calculations in Examples 12.5.6 and 12.5.7 in the text. 


Solutions to Exercises 


1. The conditional p.d.f. of Xg given X2q = x2 is 


gi(x1|x2) = F(e1, #2) = eg(@1, #2) — ho(a1) 


fo(x2) fo(x2) fo(x2) 


Let c2 = c/f2(x2), which does not depend on 2}. 


2. Let fo(x2) = J fena)der stand for the marginal p.d.f. of X2, and let gi(#1|%2) = f (x1, %2)/fo(x2) 


stand for the conditional p.d.f. of X ©) given x = £9. We are supposing that x has the marginal 


distribution with p.d.f. fo. In step 2 of the Gibbs sampling algorithm, after xs = £2 is observed, 
ee is sampled from the distribution with p.d-f. ga(x1|v2). Hence, the joint p.d.f. of eae os 
is fo(@2)91(%1, %2) = f (#1, £2). In particular xe has the same marginal distribution as X,, and the 
same argument we just gave (with subscripts 1 and 2 switched and applying step 3 instead of 2 in the 


Gibbs sampling algorithm) shows that con xe) has the same joint distribution as (xO, x®). 


3. Let h(z) stand for the p.f. or p.d.f. of the stationary distribution and let g(z|z’) stand for the conditional 
p.d.f. or p.f. of Zj41 given Z; = 2’, which is assumed to be the same for all 7. Suppose that Z; has 
the stationary distribution for some i, then (Z;, Z;,1) has the joint p.f. or p.d.f. h(z;)g(zi41|z;). Since 
Z, does have the stationary distribution, (21, Z2) has the joint p.f. or p.d.f. h(z1)g(z2|z1). Hence, 
(Z,, Z2) has the same distribution as (Z;, Zj41) whenever Z; has the stationary distribution. The proof 
is complete if we can show that Z; has the stationary distribution for every i. We shall show this by 
induction. We know that it is true for i = 1 (that is, Z, has the stationary distribution). Assume that 
each of Z1,..., Z, has the stationary distribution, and prove that Z,41 has the stationary distribution. 
Since h is the p.d-f. or p.f. of the stationary distribution, it follows that the marginal p.d.f. or p.f. of 


Zr is [ hlen)g(ensalen)den or 2 xa Pa h(zn)9(ze41|2k), either of which is h(z,41) by the definition of 


stationary distribution. Hence 2,1 also has the stationary distribution, and the induction proof is 
complete. 


4. Var(X) = 07/n and 


= a” 1 
Var(Y) = i + 72 eo Cov( Ys, ¥9). 
iFj 
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Since Cov(Yj, Y;) > 0, Var(Y) > o?/n = Var(X). 
. The sample average of all 30 observations is 1.442, and the value of s? is 2.671. The posterior hyper- 
parameters are then 


a, = 15.5, Ay = 31, wy = 1.4277, and 6, = 1.930. 


The method described in Example 12.5.1 says to simulate values of 4 having the normal distribution 
with mean 1.4277 and variance (317)~! and to simulate values of T having the gamma distribution with 
parameters 16 and 1.930+0.5(— 1.4277)”. In my particular simulation, I used five Markov chains with 
the following starting values for pu: 0.4, 1.0, 1.4, 1.8, and 2.2. The convergence criterion was met very 
quickly, but we did 100 burn-in anyway. The estimated mean of (,/Ty)~! was 0.2542 with simulation 
standard error 4.71 x 1074. 


. The data summaries that we need to follow the pattern of Example 12.5.4 are the following: 


@, = 12.5 %2 = 47.89 y = 2341.4 
si, = 50020 812 = 16737 $22 = 61990.47 
Siy = 927865 so, = 3132934 sy, = 169378608, 
and n = 26. 


(a) The histogram of 1a | values is in Fig. $.12.3. 


1.0 


0.8 
Sy 


Sample d.f. 


20 40 60 80 100 120 


Figure 8.12.3: Sample c.d.f. of (a0? values for Exercise 6a in Sec. 12.5. 


(b) i. The histogram of po + 268 + 67.2800 values is in Fig. $.12.4. 
ii. Let z’ = (1,26,67.2) as in Example 11.5.7 of the text. To create the predictions, we take 
each of the values in the histogram in Fig. $.12.4 and add a pseudo-random normal variable 
to each with mean 0 and variance 


1/2 
[t+ 2(Z'Z) 12]? 0-2, 
We then use the sample 0.05 and 0.95 quantiles as the endpoints of our interval. In three 


separate simulations, I got the following intervals (3652, 5107), (3650,5103), and (3666, 5131). 
These are all slightly wider than the interval in Example 11.5.7. 
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Figure $.12.4: Histogram of pO + 268 + 67.280 values for Exercise6(b)i in Sec. 12.5. 
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Figure $.12.5: Histogram of predicted values for 1986 sales in Exercise 6(b)iii in Sec. 12.5. 


iii. The histogram of the sales figures used in Exercise 6(b)ii is in Fig. $.12.5. This histogram 
has more spread in it than the one in Fig. $.12.4 because the 1986 predictions equal the 1986 
parameters plus independent random variables (as described in part (b)ii). The addition of 
the independent random variables increases the variance. 


7. There are n; = 6 observations in each of p = 3 groups. The sample averages are 825.83, 845.0, and 


8. 


775.0. The w; values are 570.83, 200.0, and 900.0. In three separate simulations of size 10000 each, I 
got the following three vectors of posterior mean estimates: (826.8, 843.2, 783.3), (826.8, 843.2, 783.1), 
and (826.8, 843.2, 783.2). 


(a) To prove that the two models are the same, we need to prove that we get Model 1 when we 
integrate 71,...,T7, out of Model 2. Since the 7;’s are independent, the Y;’s remain independent 
after integrating out the 7;’s. In Model 2, [Y;—(9+ By2x;))7;" ? has the standard normal distribution 


given 7;, and is therefore independent of 7;. Also, t;a0? has the x? distribution with a degrees of 
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freedom, so 


[Yi — (Bo + Brxi)ri/” 
1/2 
1 6 
has the ¢ distribution with a degrees of freedom, which is the same as Model 1. 


(b) The prior p.d.f. is a constant times 
?* exp(—fn/2) TT] |r?" exp(—anri/2)] 
i=1 
while the likelihood is 


ll aa exp(—[yi — Bo — Brai)?7:/2)| ; 


i=1 
The product of these two produces Eq. (12.5.4). 


(c) As a function of 7, we have 7 to the power (na + b)/2 — 1 times e to the power of —7/2 times 
f+a>-_,7;. This is, aside from a constant factor, the p.d.f. of the gamma distribution with 
parameters (na+b)/2 and (f+a >-7_, %)/2. Asa function of 7;, we have 7; to the power (a+1)/2—1 
times e to the power —7;[an + (y; — Bo — 812;)7|/2, which is (aside from a constant factor) the 
p.d.f. of the gamma distribution with parameters (a + 1)/2 and [an + (y; — Bo — B1%i)7]/2. Asa 
function of 59, we have a constant times e to the power 


: ” Fe — Bia" 
= 32 160 = [ys = Bra))®/2= =5 Yom (Ho — DY, 


i=1 i=l i=1 /2 


where c does not depend on (9. (Use the method of completing the square.) This is a constant 
times the p.d.f. of the normal distribution stated in the exercise. Completing the square as a 
function of 3; produces the result stated for 6; in the exercise. 


9. In three separate simulations of size 10000 each I got posterior mean estimates for (69, (1,7) of 
(—0.9526, 0.02052, 1.124 x 10-°), (—0.9593, 0.02056, 1.143 x 10-°), and (—0.9491, 0.02050, 1.138 x 10°). 
It appears we need more than 10000 samples to get a good estimate of the posterior mean of 89. The esti- 
mated posterior standard deviations from the three simulations were (1.503 x 107, 7.412 x 10~°, 7.899 x 
10-*),.(2.388 «.10-, 1.178 x 10-*, 5.799 « 10°), and (2.287% 10-7, 1.274% 10-*, 6.858 x 107°), 


10. Let the proper prior have hyperparameters pg, Ao, @o, and $9. Conditional on the Y;’s, those X;’s that 
have Y; = 1 are an i.i.d. sample of size 5>_, Y; from the normal distribution with mean y and precision 
T. 


(a) The conditional distribution of yz given all else is the normal distribution with mean equal to 


n 
po + >) ViXi . 
= , and precision equal to 7 S- Yj. 
Ao + y Yi ~ 
i=1 


(b) The conditional distribution of 7 given all else is the gamma distribution with parameters ag + 
#4 ¥i/2+ 1/2 and 


n 


1 
Bo + 5 |Ao(u — oo)” +) ¥i(Xi — H)° 
i=1 
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(c) Given everything except Yj, 
= 
ren (Le) 
i : 
1? exp ( =1X ui) rexp ( 5X?) 


(d) To use Gibbs sampling, we need starting values for all but one of the unknowns. For example, 
we could randomly assign the data values to the two distributions with probabilities 1/2 each or 
randomly split the data into two equal-sized subsets. Given starting values for the Y;’s, we could 
start 4 and 7 at their posterior means given the observations that came from the distribution 
with unknown parameters. We would then cycle through simulating random variables with the 
distributions in parts (a)—(c). After burn-in and a large simulation run, estimate the the posterior 
means by the averages of the sample parameters in the large simulation run. 


Pr(¥j = 1) = 


(e) The posterior mean of Y; is the posterior probability that Y; = 1. Since Y; = 1 is the same as the 
event that X; came from the distribution with unknown mean and variance, the posterior mean 
of Y; is the posterior probability that X; came from the distribution with unknown mean and 
variance. 


11. For this exercise, I ran five Markov chains for 10000 iterations each. For each iteration, I obtain a 
vector of 10 Y; values. Our estimated probability that X; came from the distribution with unknown 
mean and variance equals the average of the 50000 Y; values for each 7 = 1,...,10. The ten estimated 
probabilities for each of my three runs are listed below: 


Run Estimated Probabilities 

0.291 0.292 0.302 0.339 0.370 0.281 0.651 0.374 0.943 0.816 
0.285 0.286 0.302 0.339 0.375 0.280 0.656 0.371 0.945 0.819 
0.283 0.286 0.301 0.340 0.373 0.280 0.651 0.370 0.945 0.820 


1 
2 
3 


12. Note that yo should be the precision rather than the variance of the prior distribution of ju. 


(a) The prior p.d.f. times the likelihood equals a constant times 


7 = 5 
7”? exp (-5 {nlzn — ul? + #}) exp (-2uu - nol) 7-l exp (-=) : 
where s? = )7.,(%; —Z,)*. As a function of 7 this looks like the p.d.f. of the gamma distribution 
with parameters ag + n/2 and [n(%p, — py)? + s% + Bo]/2. As a function of , (by completing the 
square) it looks like the p.d.f. of the normal distribution with mean (nT%p + Moyo)/(nT + Yo) and 
variance 1/(nt +0). 


(b) The data summaries are n = 18, Z, = 182.17, and s? = 88678.5. I ran five chains of length 10000 
each for three separate simulations. For each simulation, I obtained 50000 parameter pairs. To 
obtain the interval, I sorted the 50000 yz values and chose the 1250th and 48750th values. For the 
three simulations, I got the intervals (154.2, 216.2), (154.6, 216.5), and (154.7, 216.2). 


13. In part (a), the exponent in the displayed formula should have been —1/2. 
(a) The conditional distribution of (ju — 9)y'/? given y is standard normal, hence it is independent 
of y. Also, the distribution of 2bgy is the y? distribution with 2ag degrees of freedom. It follows 
that (4 — po) /(bp/ao)'/? has the t distribution with 2a9 degrees of freedom. 
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(b) The marginal prior distributions of 7 are in the same form with the same hyperparameters in 
Exercise 12 and in Sec. 8.6. The marginal prior distributions of are in the same form also, but 
the hyperparameters are not identical. We need aj = ao to make the degrees of freedom match, 
and we need bo = 6o/Xo in order to make the scale factor match. 


(c) The prior p.d.f. times the likelihood equals a constant times 


= = T -_ 
i lees cae aes (-5 {nln — Hl? + 8% + Bo} — Tf — yo)? + 7bo 


As a function of 7 this is the same as in Exercise 12. As a function of 4, it is also the same 
as Exercise 12 if we replace yo by y. As a function of y, it looks like the p.d.f. of the gamma 
distribution with parameters ag + 1/2 and bo + (4 — po) /2. 


(d) This time, I ran 10 chains of length 10000 each for three different simulations. The three intervals 


are found by sorting the jy values and using the 2500th and 97500th values. The interval are 
(154.4, 216.3), (154.6, 215.8), and (154.4, 215.9). 


14. The exercise should have included that the prior hyperparameters are ap = 0.5, fo = 0, Ao = 1, and 
Bo = 0.5. 


(a) I used 10 chains of length 10000 each. 


(b) The histogram of predicted values is in Fig. $.12.6. There are two main differences between this 


i i ig 
-2 0 = 
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Figure $.12.6: Histogram of Log-arsenic predictions for Exercise 14b in Sec. 12.5. 


histogram and the one in Fig. 12.10 in the text. First, the distribution of log-arsenic is centered 
at slightly higher values in this histogram. Second, the distribution is much less spread out in this 
histogram. (Notice the difference in horizontal scales between the two figures. ) 


(c) The median of predicted arsenic concentration is 1.525 in my simulation, compared to the smaller 
value 1.231 in Example 12.5.8, about 24% higher. 


15. (a) For each censored observation X;,4;, we observe only that Xnj+; <c. The probability of X,4; <¢ 


given @ is 1 — exp(—cé). The likelihood times prior is a constant times 
e"te—11 — exp(—c6)]"" exp (-0>- «| : (S.12.5) 
i=1 
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We can treat the unobserved values Xn41,...,Xn+m as parameters. The conditional distribution 
of Xn+4; given @ and given that Xy4; < c has the p.d-f. 
6 exp(—0z) 
0) = ————_.,, for0<4<e. 5.12.6 
g(2|0) = - PE for <a ce (8.12.6) 


If we multiply the conditional p.d.f. of (Xn4i,.--,Xn+m) given @ times Eq. (5.12.5), we get 


+ 
n+m+a—1 an ~~ : 
0 exp 05 ba 

i=1 


for6>Oand0<2; <cfori=n+1,...,n +m. As a function of @, this looks like the p.d.f. of 


n+m 
the gamma distribution with parameters n +m +a and > x;. As a function of X,4;, it looks 


i=1 
like the p.d.f. in Eq. (S.12.6). So, Gibbs sampling can work as follows. Pick a starting value for 0, 
such as one over the average of the uncensored values. Then simulate the censored observations 
with p.d.f. (S.12.6). This can be done using the quantile function 


= So) 
; , 


Then, simulate a new @ from the gamma distribution mentioned above to complete one iteration. 


G"\(p) = 


(b) For each censored observation X,,4;, we observe only that X,4; > c. The probability of X,4; > 


given @ is exp(—c@). The likelihood times prior is a constant times 


mer Son) : (8.12.7) 
i=1 


Qgrta— i exp (-« 


We could treat the unobserved values Xn41,...,Xn+m as parameters. The conditional distribution 
of Xn+4; given @ and given that X,4; > c has the p.d-f. 
g(z|0) = Oexp(—O[x — c]), for x >. (S.12.8) 


If we multiply the conditional p.d.f. of (Xn4i,...,Xn+m) given @ times Eq. (S.12.7), we get 


+ 
n+m+a—1 _ — . 
0 exp | —0 » xi}, 
i=l 


for 6 > 0 and 2; > c fori =n+l1,...,n +m. As a function of 6, this looks like the p.d.f. of 


n+tm 

the gamma distribution with parameters n +m + aq and x x;. As a function of X,4;, it looks 
i=1 

like the p.d.f. in Eq. (S.12.8). So, Gibbs sampling can work as follows. Pick a starting value for 


8, such as the M.L.E., saa 


n 
me+ Sx; 
1 


i= 
This can be done using the quantile function 


. Then simulate the censored observations with p.d.f. (S.12.8). 


log(1 — p) 
a 
Then, simulate a new @ from the gamma distribution mentioned above to complete one interaction. 
In this part of the exercise, Gibbs sampling is not really needed because the posterior distribution 
of @ is available in closed form. Notice that (S.12.7) is a constant times the p.d.f. of the gamma 
a) 


G"'(p) =e 


distribution with parameters n + @ and mc+ = x;, which is then the posterior distribution of 0. 
i=l 
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16. (a) The joint p.d.f. of (X;,Z;) can be found from the joint p.d.f. of (X;,Y;) and the transformation 
h(a,y) = (a,2 + y). The joint p.d.f. of (X;, Y;) is f(a, y) = Awexp(—xA — yp) for x,y > 0. The 
inverse transformation is h~!(x,z) = (x,z — 2), with Jacobian equal to 1. So, the joint p.d-f. of 

g(x, z) = f(x, z— 2x) = Apexp(—2[A — pw] — zu), forO< a4<z,2z>0. 


The marginal p.d.f. of Z; is the integral of this over x, namely 


aN 
g2(2) = [i — exp(—al ~ 1) exp(—21), 
for z > 0. The conditional p.d.f. of X; given Z; = z is the ratio 
g(x, 2) _ A —- 


gz) ie —p]), for0O<a<z. (S.12.9) 


The conditional c.d.f. of X; given Z; = z is the integral of this, which is the formula in the text. 
(b) The likelihood times prior is 


\rta-l n+b—1 k k n 
Ta pyre exP (-A}ox _— udu] II [1 — exp (—[A — p]z:)]- (S.12.10) 
(A — 1) i=1 i=1 / i=k+1 
We can treat the unobserved pairs (X;, Y;) for i = k+1,...,n as parameters. Since we observe 
X;+ Y; = Z, we shall just treat X; as a parameter. The conditional p.d.f. of X; given the other 
parameters and Z; is in (S.12.9). Multiplying the product of those p.d.f.’s fori = k+1,...,n 
times (5.12.10) gives 


n k n 
Nirte-l wtb—1 exp [a3 =i 3: yi t x. (zi — “9 F (8.12.11) 
i=l i=1 


i=k+1 
where 0 < a; < 2; fori =k+1,...,n. As a function of A, (S.12.11) looks like the p.d.f. of the 


nm 
gamma distribution with parameters n +a and S- x;. As a function of y it looks like the p.d-f. of 


i=l 
n 


the gamma distribution with parameters n+ b and S- y;. As a function of 2; (@=k+1,...,n), it 


looks like the p.d.f. in (S.12.9). So, Gibbs sampling ne work as follows. Pick starting values for ju 
and A, such as one over the averages of the observed values of the X;’s and Y;’s. Then simulate the 
unobserved X; values for i = k+1,...,n using the probability integral transform. Then simulate 
new A and yp values using the gamma distributions mentioned above to complete one iteration. 


12.6 The Bootstrap 


Commentary 


The bootstrap has become a very popular technique for solving non-Bayesian problems that are not amenable 
to analysis. The nonparametric bootstrap can be implemented without much of the earlier material in this 
chapter. Indeed, one need only know how to simulate from a discrete uniform distribution (Example 12.3.11) 
and compute simulation standard errors (Sec. 12.2). 

The software R has a function boot that is available after issuing the command library (boot). The 
first three arguments to boot are a vector data containing the original sample, a function f to compute the 
statistic whose distribution is being bootstrapped, and the number of bootstrap samples to create. For the 
nonparametric bootstrap, the function £ must have at least two arguments. The first will always be data, and 
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the second will be a vector inds of integers of the same dimension as data. This vector inds will choose the 
bootstrap sample. The function should return the desired statistic computed from the sample data[inds]. 
Any additional arguments to f can be passed to boot by setting them explicitly at the end of the argument 
list. For the parametric bootstrap, boot needs the optional arguments sim="parametric" and ran.gen. 
The function ran.gen tells how to generate the bootstrap samples, and it takes two arguments. The first 
argument will be data. The second argument is anything else that you need to generate the samples, for 
example, estimates of parameters based on the orginal data. Also, £ needs at least one argument which will 
be a simulated data set. Any additional arguments can be passed explicitly to boot. 


Solutions to Exercises 


if 


We could start by estimating @ by the M.L.E., 1/X. Then we would use the exponential distribution 
with parameter 1/X for the distribution F in the bootstrap. The bootstrap estimate of the variance 
of X is the variance of a sample average X of a sample of size n from the distribution F, i.e., the 
exponential distribution with parameter 1/X. The variance of xX ig] /n times the variance of a single 
observation from F’, which equals x. So, the bootstrap estimate is x? /n. 


. The numbers 21,..., 2%, are known when we sample from F,,. Let i1,...,in € {1,...,n}. Since X; = Lis 


if and only if J; = 7;, we can compute 
n n 
Pr Ay S 95-2, = 0p,) SP Stated =) = |] Pe Sa) = | re =a. 


The second equality follows from the fact that Jj,...,J, are a random sample with replacement from 
the set {1,...,n}. 


. Let n = 2k +1. The sample median of a nonparametric bootstrap sample is the k + 1st smallest 


observation in the bootstrap sample. Let x denote the smallest observation in the original sample. 
Assume that there are ¢ observations from the original sample that equal xz. (Usually @ = 1, but it 
is not necessary.) The sample median from the bootstrap sample equals x from the original data set 
if and only if at least k + 1 observations in the bootstrap sample equal x. Since each observation 
in the bootstrap equals x with probability @/n and the bootstrap observations are independent, the 
probability that at least k + 1 of them equal zx is 


ENOCT 


t=k+1 


. For each bootstrap sample, compute the sample median. The bias estimate is the average of all of these 


sample medians minus the original sample median, 201.3. I started with a pilot sample of size 2000 
and estimated the bias as 0.545. The sample variance of the 2000 sample medians was 3.435. This led 
me to estimate the necessary simulation size as 


9\ 3.4351/2]° 
oe (- me -) ana = 23234. 


So, I did 30000 bootstrap samples. The new estimate of bias was 0.5564, with a simulation standard 
error of 0.011. 


. This exercise is performed in a manner similar to Exercise 4. 
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(a) In this case, I did three simulations of size 50000 each. The three estimates of bias were —1.684, 
—1.688, and —1.608. 


(b) Each time, the estimated sample size needed to achieve the desired accuracy was between 48000 
and 49000. 


(a) For each bootstrap sample, compute the sample median. The estimate we want is the sample 
variance of these values. I did a pilot simulation of size 2000 and got a sample variance of 18.15. 
I did another simulation of size 10000 and got a sample variance of 18.87. 


(b) To achieve the desired accuracy, we would need a simulation of size 


1+0.95\ 18.871/2]7 
le (=) = 2899533. 


That is, we would need about three million bootstrap samples. 


(a) Each bootstrap sample consists of x having a normal distribution with mean 0 and variance 


31.65/11, yo having the normal distribution with mean 0 and variance 68.8/10, seo being 31.65 
times a y? random variable with 10 degrees of freedom, and see) being 68.8 times a y? random 
variable with 9 degrees of freedom. For each sample, we compute the statistic U displayed in 
Example 12.6.10 in the text. We then compute what proportion of the absolute values of the 
10000 statistics exceed the 0.95 quantile of the ¢ distribution with 19 degrees of freedom, 1.729. 
In three separate simulations, I got proportions of 0.1101, 0.1078, and 0.1115. 

(b) To correct the level of the test, we need the 0.9 quantile of the distribution of |U|. For each 
simulation, we sort the 10000 |U| values and select the 9000th value. In my three simulations, 
this value was 1.773, 1.777, and 1.788. 

(c) To compute the simulation standard error of the sample quantile, I chose to split the 10000 samples 
into eight sets of size 1250. For each set, I sort the |U| values and choose the 1125th one. The 
simulation standard error is then the the square-root of one-eighth of the the sample variance of 
these eight values. In my three simulations, I got the values 0.0112, 0.0136, and 0.0147. 


The correlation is the ratio of the covariance to the square-root of the product of the variances. The 
n 

mean of X* is E(X*) = X, and the mean of Y* is E(Y*) = Y. The variance of X* is SOX = KY ne 
i=1 


nm 
and the variance of Y* is ye —Y)?/n. The covariance is 
i=1 


BU(x* XY" -¥)] = => -(%— XY -¥). 
i=1 


Dividing this by the square-root of the product of the variances yields (12.6.2). 


(a) For each bootstrap sample, compute the sample correlation R®. Then compute the sample 
variance of R®,..., RO), This is the approximation to the bootstrap estimate of the variance 
of the sample correlation. I did three separate simulations and got sample variances of 4.781 x 107+, 
4.741 x 10-4, and 4.986 x 1074. 

(b) The approximation to the bootstrap bias estimate is the sample average of RY... , R@9) minus 
the original sample correlation, 0.9670. In my three simulations, I got the values —0.0030, —0.0022, 
and —0.0026. It looks like 1000 is not enough bootstrap samples to get a good estimate of this 
bias. 
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For the simulation standard error of the variance estimate, we use the square-root of Eq. (12.2.3) 
where each Y@ in (12.2.3) is R® in this exercise. In my three simulations, I got the values 
2.231 x 107°, 2.734 x 107°, and 3.228 x 107-°. For the simulation standard error of the bias 
estimate, we just note that the bias estimate is an average, so we need only calculate the square- 
root of 1/1000 times the sample variance of R®...., RO) I my simulations, I got 6.915 x 1074, 
6.886 x 10~*, and 7.061 x 10~*. 


10. For both parts (a) and (b), we need 10000 bootstrap samples. From each bootstrap sample, we compute 


11. 


the 


sample median. Call these values M*, for i = 1,...,10000. The median of the original data is 


M = 152.5. 


(a) 


(b) 


Sort the M* values from smallest to largest. The percentile interval just runs from the 500th 
sorted value to the 9500th sorted value. I ran three simulations and got the following three 
intervals: [148,175], [148,175], and [146.5, 175]. 


Choose a measure of spread and compute it from the original sample. Call the value Y. For 
each bootstrap sample, compute the same measure of spread Y*“. I choose the median absolute 
deviation, which is Y = 19 for this data set. Then sort the values (M* — M)/Y*. Find the 
500th and 9500th sorted values Z599 and Zo509. The percentile-t confidence interval runs from 
M— ZY to M+ ZY. In my three simulations, I got the intervals [143,181], [142.6,181], and 
(141.9, 181]. 


The sample average of the beef hot dog values is 156.9, and the value of o’ is 22.64. The confidence 
interval based on the normal distribution use the ¢ distribution quantile Tg /(0.95) = 1.729 and 
equals 156.9 + 1.729 x 22.64/20'/2, or [148.1,165.6]. This interval is considerably shorter than 
either of the bootstrap intervals. 


If X* has the distribution F,,, then yp = E(X*) = X, 
n 


1 = 
: X*\=-—) (4 -—X/), and 
o Var (X*) =e )’, an 


Te, 
eek 


E([X — y]’) 


Plugging these values into the formula for skewness (see Definition 4.4.1) yields the formula for 
M3 given in this exercise. 


nm 
The summary statistics of the 1970 fish price data are X = 41.1, YG — X)*/n = 1316.5, and 


i=l 
n 


SOG — X)3/n = 58176, so the sample skewness is M3 = 1.218. For each bootstrap sample, 
i=1 

we also compute the sample skewness M3 © fori = 1,...,1000. The bias of M3 is estimated by 
the sample average of the M3 > minus M3. I did three simulations and got the values —0.2537, 
—0.2936, and —0.2888. To estimate the standard deviation of M3, compute the sample standard 
deviation of the Mis, In my three simulations, I got 0.5480, 0.5590, and 0.5411. 


12. We want to show that the distribution of R is the same for all parameter vectors (ji, iggtevee p) 
that share the same value of p. Let 0; = (Het, Myt's C21 Cys 2) and @5 = (Hx2; My2s 7295 0495 P) be two 
parameter vectors that share the same value of p. Let az = 072/011, dy = Cy2/Oy1, br = ba2 — Mel; 


and 


by — fly2 — yi. For i = 1,2, let W; be a sample of size n from a bivariate normal distribution 


with parameter vector 0;, and let R; be the sample correlation. We want to show that R, and R» have 
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the same distribution. Write W,; = [(Xi1, Yi1),---, (Xin, Yin)] for i = 1,2. Define Xi; = tial Nag + Ba); 
Yj, = ay(Yij + by) for j =1,...,n. Then it is trivial to see that W = [(X41, Ys1),---;(X4ns Yon)] has 
the same distribution as W2. Let R4 be the sample correlation computed from W4. Then R} and R2 
have the same distribution. We complete the proof by showing that Ry = R,. Hence R45 and R; and 


Ry all have the same distribution. To see that R5 = R1, let X, = = Xj; and similarly Yi, x, 9 and 
j=l 

Y,. Then X5 = az(X) + bz) and Ys = ay(¥1 + by). So, for each j, X5; — X, = a,(X1; — X1) and 

Yo; — Y5 = a,(Yi; —Y1). Since az, ay > 0, it follows that 


- 1/2 
(|e, - } 


Ay Qy So(X1y - X1)(Ylij- Y1,) 


nr 
>! 
> Xj — X5) (Ya; — Y2) 
j=1 


3; - ¥2)? 


gl 


12.7 Supplementary Exercises 


Solutions to Exercises 


1. For the random number generator that I have been using for these solutions, Fig. $.12.7 contains 
one such normal quantile plot. It looks fairly straight. On the horizontal axis I plotted the sorted 


Simulated values 


Normal quantiles 


Figure S.12.7: Normal quantile plot for Exercise 1 in Sec. 12.7. A straight line has been added for reference. 


pseudo-normal values and on the vertical axis, I plotted the values ®~!(i/10001) for i = 1,..., 10000. 


2. The plots for this exercise are formed the same way as that in Exercise 1 except we replace the normal 
pseudo-random values by the appropriate gamma pseudo-random values and we replace ®~! by the 
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quantile function of the appropriate gamma distribution. Two of the plots are in Fig. $.12.8. The plots 
are pretty straight except in the extreme upper tail, where things are expected to be highly variable. 


Simulated values 


Figure $.12.8: Gamma quantile plots for Exercise 2 in Sec. 12.7. The left plot has parameters 0.5 and 1 and 
the right plot has parameters 10 and 1. Straight lines have been added for reference. 


3. Once again, the plots are drawn in a fashion similar to Exercise 1. This time, we notice that the plot 
with one degree of freedom has some really serious non-linearity. This is the Cauchy distribution which 
has very long tails. The extreme observations from a Cauchy sample are very variable. Two of the 


plots are in Fig. $.12.9. 


Simulated values 


Figure $.12.9: Two t quantile plots for Exercise 3 in Sec. 12.7. The left plot has 1 degree of freedom, and 
the right plot has 20 degrees of freedom. Straight lines have been added for reference. 


4. (a) I simulated 1000 pairs three times and got the following average values: 1.478, 1.462, 1.608. It 
looks like 1000 is not enough to be very confident of getting the average within 0.01. 


(b) Using the same three sets of 1000, I computed the sample variance each time and got 1.8521, 
1.6857, and 2.5373. 


(c) Using (12.2.5), it appears that we need from 120000 to 170000 simulations. 
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(a) To simulate a noncentral t random variable, we can simulate independent Z and W with Z having 


— 
e) 


eS 


YS 


ow URS 


the normal distribution with mean 1.936 and variance 1, and W having the x? distribution with 
14 degrees of freedom. Then set T = Z/(W/14)'/?. 


I did three separate simulations of size 1000 each and got the following three proportions with 
T > 1.761: 0.571, 0.608, 0.577. The simulation standard errors were 0.01565, 0.01544, and 0.01562. 


Using (12.2.5), we find that we need a bit more than 16000 simulated values. 


For each sample, we compute the numbers of observations in each of the four intervals (—oo, 3.575), 
(3.575, 3.912), [3.912, 4.249), and [4.249, 00). Then we compute the Q statistic as we did in Exam- 
ple 10.1.6. We then compare each Q statistic to the three critical values 7.779, 9.488, and 13.277. 
We compute what proportion of the 10000 Q’s is above each of these three critical values. I did 
three separate simulations of size 10000 each and got the proportions: 0.0495, 0.0536, and 0.0514 
for the 0.9 critical value (7.779). I got 0.0222, 0.0247, and 0.0242 for the 0.95 critical value (9.488). 
I got 0.0025, 0.0021, and 0.0029 for the 0.99 critical value (13.277). It looks like the test whose 
nominal level is 0.1 has size closer to 0.05, while the test whose nominal level is 0.05 has level 
closer to 0.025. 


For the power calculation, we perform exactly the same calculations with samples from the different 
normal distribution. I performed three simulations of size 1000 each for this exercise also. I got 
the proportions: 0.5653, 0.5767, and 0.5796 for the 0.9 critical value (7.779). I got 0.4560, 0.4667, 
and 0.4675 for the 0.95 critical value (9.488). I got 0.2224, 0.2280, and 0.2333 for the 0.99 critical 
value (13.277). 


We need to compute the same Q statistics as in Exercise 6(b) using samples from ten different 
normal distributions. For each of the ten distributions, we also compute the 0.9, 0.95 and 0.99 
sample quantiles of the 10000 Q statistics. Here is a table of the simulated quantiles: 


Quantile 
fe oe 0.9 0.95 0.99 
3.8 0.25] 3.891 4.976 7.405 
3.8 0.80 | 4.295 5.333 8.788 
3.9 0.25 | 3.653 4.764 6.405 
3.9 0.80 | 4.142 5.133 7.149 
4.0 0.25 | 3.825 5.104 7.405 
4.0 0.80 | 4.554 5.541 8.635 
4.1 0.25 | 3.861 5.255 8.305 
4.1 0.80 | 4.505 5.658 8.637 
A.2 0.25] 4.193 5.352 8.260 
4.2 0.80 | 4.087 4.981 7.677 


The quantiles change a bit as the distributions change, but they are remarkably stable. 


Instead of starting with normal samples, we start with samples having at distribution as described 
in the exercise. We compute the Q statistic for each sample and see what proportion of our 10000 
Q statistics is greater than 5.2. In three simulations of this sort I got proportions of 0.12 0.118, 
and 01.24. 


(a) The product of likelihood times prior is 


_ 2 P ips — aT \2 ; — oh\2 
(ay oy + Mo ee Nn oF) 
7=1 
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Pp 
x \P/2+70-1 exp(—Nédy) ul pootlni+l]/2-1 
a ’ 
i=1 
ni 
where w; = So (yij —¥%,)* for i=1,...,p. 
j=l 


As a function of j1;, 7, or = this looks the same as it did in Example 12.5.6 except that Ag needs 
to be replaced by A wherever it occurs. As a function of X, it looks like the p.d.f. of the gamma 


Pp 
distribution with parameters p/2 +7 and 69 + >; Ti(bg — )? /2. 
i=1 


TI ran six Markov chains for 10000 iterations each, producing 60000 parameter vectors. The re- 
quested posterior means and simulation standard errors were 


Parameter ca L2 L3 [4 Lit} 1/tT2 1/tT3 Lis 


Posterior mean 156.9 158.7 118.8 160.6 486.7 598.8 479.2 548.4 
Sim. std. err. 0.009583 0.01969 0.02096 0.01322 0.8332 0.8286 0.5481 0.9372 


The code at the end of this manual was modified following the suggestions in the exercise in order 
to produce the above output. The same was done in Exercise 9. 


The product of likelihood times prior is 


ep (eee as, B+ mi (bi — Gi)” + wi + Ao (Ma *)) 
i=l 


p 
x gpooteo—1 exp(—Bdp) i pear ee et 
7 
i=1 
ni 
where w; = So (yij —¥;)" fori =1,..-,p: 
j=l 
As a function of j4;, 7, or ~ this looks the same as it did in Example 12.5.6 except that 69 needs 
to be replaced by @ wherever it occurs. As a function of {, it looks like the p.d.f. of the gamma 
P 
distribution with parameters pag + €9 and ¢9 + x Tj. 
i=1 


I ran six Markov chains for 10000 iterations each, producing 60000 parameter vectors. The re- 
quested posterior means and simulation standard errors were 


Parameter Hy L2 L3 [4 Liar 1/T 1/rT3 1/4 


Posterior mean 156.6 158.3 120.6 159.7 495.1 609.2 545.3 570.4 
Sim. std. err. 0.01576 0.01836 0.02140 0.03844 0.4176 1.194 0.8968 0.7629 


The numerator of the likelihood ratio statistic is the maximum of the likelihood function over 
all parameter values in the alternative hypothesis, while the denominator is the maximum of the 
likelihood over all values in the null hypothesis. Both the numerator and denominator have a factor 
k 
is 
of a ( . that will divide out in the ratio, so we shall ignore these factors. In this example, the 
i=1 a 
maximum over the alternative hypothesis will be the maximum over all parameter values, so we 
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would set p; = X;/n; in the likelihood to get 


For the denominator, all of the p; are equal, hence the likelihood to be maximized is p*!+""+*« (1 — 
k k 


pyrite tre-X1 "Xe This is maximized at p= S- AG y, nj;, to yield 
j=l j=l 


dja Xi k jas 7 —X5) k Dja * k Dja1("9—*3) 
x; (3%) (3200,— x0) 


k 
pee 
j=l 1 j=l j=l j=l 
k k sa 
n; n A eee 
j= 


The ratio of these two maxima is a positive constant times the statistic stated in the exercise. The 
likelihood ratio test rejects the null hypothesis when the statistic is greater than a constant. 


(b) Call the likelihood ratio test statistic T. The distribution of T, under the assumption that Ho is 
true, that is py =--- = px still depends on the common value of the p,’s, call it p. If the sample 
sizes are large, the distribution should not depend very much on p, but it will still depend on 
p. Let F,(-) denote the c.d.f. of T when p is the common value of all p;’s. If we reject the null 
hypothesis when T > c, the test will be of level ag so long as 


1— F,(c) < ao, for all p. ($.12.12) 


If c satisfies (S.12.12) then all larger c satisfy (S.12.12), so we want the smallest c that satisfies 
(S.12.12). Eq. (S.12.12) is equivalent to F,(c) > 1— ap for all p, which, in turn, is equivalent to 
c> pets ot —ao) for all p. The smallest c that satisfies this last inequality is c = sup, Fj, 1(1—a). 
To approximate c by simulation, proceed as follows. Pick a collection of reasonable values of p and 
a large number v of simulations to perform. For each value of p, perform v simulations as follows. 
Simulate & independent binomial random variables with parameters n; and p, and compute the 
value of T. Sort the v values of T and approximate F'(1 — a) by the (1 — ao)vth sorted value. 
Let c be the largest of these values over the different chosen values of p. It should be clear that 
the distribution of Tis the same for p as it is for 1 — p, so one need only check values of p between 
0 and 1/2. 


(c) To compute the p-value, we first find the observed value ¢ of T’, and then find sup, Pr(T' > t) under 
the assumption that the each p; = p fori = 1,...,k. In Table 2.1, the X; values are X; = 22, 
X2 = 25, X3 = 16, X4 = 10, while the sample sizes are n; = 40, no = 38, n3 = 38, n4 = 34. The 
observed value of T is 

227 182513 1b a2" 1024 
t= 737377717 
A pilot simulation showed that the maximum over p of 1 — F,(t) occurs at p = 0.5, so a larger 
simulation was performed with p = 0.5. The estimated p-value is 0.01255 with a simulation 
standard error of 0.0039. 


= exp(—202.17). 


11. (a) We shall use the same approach as in Exercise 12 of Sec. 12.6. Let the parameter be 6 = (1,01, 02) 
(where py is the common value of 1; = f2). Each pair of parameter values 6 and 6’ that have the 
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same value of 02/0, can be obtained from each other by multiplying 4, 0; and a2 by the same 
positive constant and adding some other constant to the resulting yz. That is, there exist a > 0 and 
b such that 6’ = (a+ b,ao1,a02). If X1,...,Xm and Yj,...,Y;, have the distribution determined 
by 0, then Xj = aX; +b fori =1,...,m and Yj = a¥j +b for j = 1,...,n have the distribution 
determined by 6’. We need only show that the statistic V in (9.6.13) has the same value when it 
is computed using the X;,’s and Y;’s as when it is computed using the X/’s and Y} ’s. It is easy to 
see that the numerator of V computed with the X/’s and Y} equals a times the numerator of V 
computed using the X;’s and Y;’s. The same is true of the denominator, hence V has the same 
value either way and it must have the same distribution when the parameter is 0 as when the 
parameter is 6’. 


By the same reasoning as in part (a), the value of v is the same whether it is calculated with 
the X;’s and Y;’s or with the X/’s and Y}s. Hence the distribution of v (thought of as a random 
variable before observing the data) depends on the parameter only through 02/01. 


For each simulation with ratio r, we can simulate X,, having the standard normal distribution and 
S% having the y? distribution with 9 degrees of freedom. Then simulate Y,, having the normal 
distribution with mean 0 and variance r? and $2 equal to r? times a y? random variable with 10 
degrees of freedom. Make the four random variables independent when simulating. Then compute 
V and v. Compute the three quantiles T71(0.9), T-1(0.95) and T7+(0.99) and check whether V 
is greater than each quantile. Our estimates are the proportions of the 10000 simulations in which 
the value of V are greater than each quantile. Here are the results from one of my simulations: 


Probability 
r 0.9 0.95 0.99 


1.0 | 0.1013 0.0474 0.0079 
1.5 | 0.0976 0.0472 0.0088 
2.0 | 0.0979 0.0506 0.0093 
3.0 | 0.0973 0.0463 0.0110 
5.0 | 0.0962 0.0476 0.0117 
10.0 | 0.1007 0.0504 0.0113 


The upper tail probabilities are very close to their nominal values. 


12. I used the same simulations as in Exercise 11 but computed the statistic U from (9.6.3) instead of V 


and compared U to the quantiles of the ¢ distribution with 19 degrees of freedom. The proportions are 
below: 


Probability 
r 0.9 0.95 0.99 


1.0 | 0.1016 0.0478 0.0086 
1.5 | 0.0946 0.0461 0.0090 
2.0 | 0.0957 0.0483 0.0089 
3.0 | 0.0929 0.0447 0.0112 
5.0 | 0.0926 0.0463 0.0124 
10.0 | 0.0964 0.0496 0.0121 


These values are also very close to the nominal values. 


13. (a) The fact that E(3,) = 6; depends only on the fact that each Y; has mean ( + 241 . It does not 


depend on the distribution of Y; (as long as the distribution has finite mean). Since (; is a linear 
function of Yj,...,Y,, its variance depends only on the variances of the Y;’s (and the fact that 
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they are independent). It doesn’t depend on any other feature of the distribution. Indeed, we can 


write 
We) a 
A — _ = a aiYi, 
Sy — ze.) = 
j=l 
where a; = (2; — En)/ ve, —F,)”. Then Var(8,) = 7%, a? Var(Y;). This depends only on the 


variances of the Y;’s, hich do not depend on ( or 6. 
(b) Let T have the ¢ distribution with k degrees of freedom. Then Y; has the same distribution as 
Bo + Bix; + oT, whose variance is 0? Var(T). Hence, Var(Y;) = 0? Var(T). It follows that 


n 


Var(3,) = 0? Var(T) > i. 
i=1 
Leta = Var(T) © 4 a. 
(c) There are several possible simulation schemes to estimate v. The simplest might be to notice that 
nm 
da = — 
dij=1(2i — Fn) 


so that we only need to estimate Var(7’). This could be done by simulating lots of ¢ random 
variables with k& degrees of freedom and computing the sample variance. In fact, we can actually 
calculate v in closed form if we wish. According to Exercise 1 in Sec. 8.4, Var(T’) = k/(k — 2). 


14. As we noted in Exercise 13(c), the value of v is 


= 3.14 x 107°. 


15. (a) We are trying to approximate the value a that makes ¢(a) = E[L(0,a)\a] the smallest. We 
have a _ a ..,0) from the posterior distribution of 6, so we can approximate f(a) by 


= SU (6, a)/v. We could then do a search through many values of a to find the value that 


minimizes ba ). We could use either brute force or mathematical software for minimization. Of 
course, we would only have the value of a that minimizes @(a) rather than ¢(a). 


(b) To compute a simulation standard error, we could draw several (say k) samples from the posterior 
(or split one large sample into k smaller ones) and let Z; be the value of a that minimizes the ith 
version of . Then compute S$ in Eq. (12.2.2) and let the simulation standard error be S/k!/2. 


16. (In the displayed formula, on the right side of the = sign, all @’s should have been y’s.) The posterior 
hyperparameters are all given in Example 12.5.2, so we can simulate as many jz values as we want to 
estimate the posterior mean of L(@,a). We simulated 100000 t random variables with 22 degrees of 
freedom and multiplied each one by 15.214 and added 183.95 to get a sample of ~ values. For each 
value of a near 183.95, we computed é(a) and found that a = 182.644 gave the smallest value. We then 
repeated the entire exercise for a total of five times. The other four a values were 182.641, 182.548, 
182.57 and 182.645. The simulation standard error is then 0.0187 
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R Code For Two Text Examples 


If you are not using F or if you are an expert, you should not bother reading this section. 

The code below (with comments that start #) is used to perform the calculations in Examples 12.5.6 
and 12.5.7 in the text. The reason that the code appears to be so elaborate is that I realized that Exercises 8 
and 9 in Sec. 12.7 asked to perform essentially the same analysis, each with one additional parameter. 
Modifying the code below to handle those exercises is relatively straightforward. Significantly less coding 
would be needed if one were going to perform the analysis only once. For example, one would need the three 
functions that simulate each parameter given the others, plus the function called hierchain that could be 
used both for burn-in and the later runs. The remaining calculations could be done by typing some additional 
commands at the R prompt or in a text file to be sourced. 

In the first printing, there was an error in these examples. For some reason (my mistake, obviously) the 
w; values were recorded in reverse order when the simulations were performed. That is, w4 was used as if 
it were w 1, w3 was used as if it were we, etc. The 7; and n; values were in the correct order, otherwise the 
error could have been fixed by reordering the hot dog type names, but no such luck. Because the w; were 
such different numbers, the effect on the numerical output was substantial. Most notably, the means of the 
1/7; are not nearly so different as stated in the first printing. 

The data file hotdogs.csv contains four columns separated by commas with the data in Table 11.15 
along with a header row: 


Beef ,Meat ,Poultry,Specialty 
186,173,129,155 
181,191,132,170 
176,182,102,114 
149,190,106,191 
184,172,94,162 
190,147 ,102,146 
158,146,87,140 
139,139,99,187 
175,175,107,180 
148,136,113,, 
152,179,135,, 
111,153,142,, 
141,107,86,, 
153,195,143,, 
190,135,152,, 
157,140,146,, 
131,138,144,, 
149,,, 

135,,, 
132,,, 


The commas with nothing after them indicate that the data in the next column has run out already, and 
NA (not available) values will be produced in R. Most R functions have sensible ways to deal with NA values, 
generally by including the optional argument na.rm=T or something similar. By default, R uses all values 
(including NA’s) to compute things like mean or var. Hence, the the result will be NA if one does not change 
the default. 

First, we list some code that sets up the data, summary statistics, and prior hyperparameters. The lines 
that appear below were part of a file hotdogmcmc-example.r. They were executed by typing 


Copyright © 2012 Pearson Education, Inc. Publishing as Addison-Wesley. 


R Code For Two Text Examples 433 


source ("hotdogmcmc-example.r") 
at the R prompt in a command window. The source function reads a file of text and treats each line as if 
it had been typed at the R prompt. 


# Read the data from a comma-separated file with a header row. 
hotdogs=read.table("hotdogs.csv",header=T,sep=",") 

# Compute the summary statistics 

# First, the sample sizes: how many are not NA? 

n=apply (hotdogs, 2,function(x) {sum(!is.na(x))}) 

# Next, the sample means: remember to remove the NA values 
ybar=apply (hotdogs ,2,mean,na.rm=T) 

# Next, the wi values (sum of squared deviations from sample mean) 
w=apply (hotdogs, 2,var,na.rm=T) *(n-1) 

# Set the prior hyperparameters: 

hyp=list (lambda0=1, alphaO=1, beta0=0.1, u0=0.001, psi0=170) 

# Set the initial values of parameters. These will be perturbed to be 
# used as starting values for independent Markov chains. 

tau=(n-1)/w 
psi=(hyp$psi0*hyp$u0+hyp$lambda0*sum(tau*ybar) ) / (hyp$u0+hyp$lambda0*sum (tau) ) 
mu=(n*ybar+hyp$lambda0*psi) / (nt+thyp$lambda0) 


Next, we list a series of functions that perform major parts of the calculation. The programs are written 
specifically for these examples, using variable names like ybar, n, w, mu, tau, and psi so that the reader can 
easily match what the programs are doing to the example. If one had wished to have a general hierarchical 
model program, one could have made the programs more generic at the cost of needing special routines to 
deal with the particular structure of the examples. Each of these functions is stored in a text file, and the 
source function is used to read the lines which in turn define the function for use by R. That is, after each 
file has been “sourceed,” the function whose name appears to the left of the = sign becomes available for 
use. It’s arguments appear in parentheses after the word function on the first line. 

First, we have the functions that simulate the next values of the parameters in each Markov chain: 


mugen=function(i,tau,psi,n,ybar,w,hyp){ 


# 

# Simulate a new mu[i] value 

# 

(nli] *ybar [i] thyp$lambda0*psi) /(n[i]+hyp$lambda0)+rnorm(1,0,1)/sqrt (tau[i] * 
(n[i]+hyp$lambda0) ) 

} 

taugen=function(i,mu,psi,n,ybar,w,hyp){ 

# 

# Simulate a new tau[i] value 

# 


rgamma(1,hyp$alpha0+0.5*(n[i]+1))/(hyp$beta0+0.5* (wli]+n[i] *(mu[i]-ybar [i])*2+ 
hyp$lambda0* (mu [i]-psi)~2)) 

I 

psigen=function(mu,tau,n,ybar,w,hyp) { 

# 

# Simulate a new psi value 

# 
(hyp$psi0*hyp$u0+hyp$lambda0*sum(tau*mu) )/ (hyp$u0t+thyp$lambda0*sum (tau) )+ 
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rnorm(1,0,1)/sqrt (hyp$u0t+thyp$lambda0*sum (tau) ) 


Next is the function that does burn-in and computes the F’ statistics described in the text. If the F’ 
statistics are too large, this function would have to be run again from the start with more burn-in. One could 
rewrite the function to allow it to start over from the end of the previous burn-in if one wished. (One would 
have to preserve the accumulated means and sums of squared deviations.) 


burnchain=function(nburn, start ,nchain,mu,tau,psi,n,ybar,w,hyp,stand){ 
# 
# Perform "nburn" burn-in for "nchain" Markov chains and check the F statistics 


i starting after "start". The initial values are "mu", "tau", "psi" 
# and are perturbed by "stand" times random variables. The data are 
# "n", "ybar", "w". The prior hyperparameters are "hyp". 

# 


# ngroup is the number of groups 
ngroup=length(ybar) 
# Set up the perturbed starting values for the different chains. 
# First, store 0 in all values 
muval=matrix(0,nchain,ngroup) 
tauval=muval 
psival=rep(0,nchain) 
# Next, for each chain, perturb the starting values using random 
# normals or lognormals 
for(1 in 1:nchain){ 
muval [1, ]=mu+stand*rnorm(ngroup) /sqrt (tau) 
tauval[1,]=tau*exp (rnorm(ngroup) *stand) 
psival [1]=psit+tstand*rnorm(1)/sqrt (hyp$u0) 
# Save the starting vectors for all chains just so we can see what 
# they were. 
startvec=cbind(muval,tauval,psival) 


} 
# The next matrices/vectors will store the accumulated means "...a" and sums 
# of squared deviations "...v" so that we don’t need to store all of the 


# burn-in simulations when computing the F statistics. 
# See Exercise 23(b) in Sec. 7.10 of the text. 
muacca=matrix(0,nchain,ngroup) 
tauacca=muacca 
psiacca=rep(0,nchain) 
muaccv=muacca 
tauaccv=muacca 
psiaccv=psiacca 
# The next matrix will store the burn-in F statistics so that we can 
tt see if we need more burn-in. 
fs=matrix(0,nburn-startt1,2*ngroupt1) 
# Loop through the burn-in 
for(i in 1:nburn){ 
# Loop throught the chains 
for(1 in 1:nchain){ 
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# Loop through the coordinates 

for(j in 1:ngroup){ 

# Generate the next mu 
muval[1,j]=mugen(j,tauval[1,],psival[1] ,n,ybar,w,hyp) 

# Accumulate the average mu (muacca) and the sum of squared deviations (muaccv) 
muaccv[1,j]=muaccv[1,j]+(i-1)*(muval[1,j]-muacca[1,j])*2/i 
muacca[1,j]=muacca[1,j]+(muval[1, j]-muaccal[1,j])/i 

# Do the same for tau 
tauval[1, j]=taugen(j,muval[1,],psival[1] ,n,ybar,w,hyp) 
tauaccv[1, j]=tauaccv[1,j]+(i-1)*(tauval[1,j]-tauacca[1,j])°2/i 
tauacca[1,j]=tauacca[1,j]+(tauval[1,j]-tauaccal[1,j])/i 

} 
# Do the same for psi 
psival [1]=psigen(muval[1,],tauval[1,],n,ybar,w,hyp) 
psiaccv[1]=psiaccv[1]+(i-1)*(psival[1]-psiacca[1])*2/i 
psiacca[1]=psiacca[1]+(psival[1]-psiacca[1])/i 
} 
# Once we have enough burn-in, start computing the F statistics (see 
# p. 826 in the text) 
if (i>=start){ 

mub=i*apply (muacca,2,var) 

muw=apply (muaccv,2,mean)/(i-1) 

taub=i*xapply (tauacca,2,var) 

tauw=apply (tauaccv,2,mean)/(i-1) 

psib=ix*var (psiacca) 

psiw=mean(psiaccv)/(i-1) 

fs[i-start+1,]=c (mub/muw, taub/tauw, psib/psiw) 

: 


I 
# Return a list with useful information: the last value of each 

# parameter for all chains, the F statistics, the input information, 
# and the starting vectors. The return value will contain enough 

# information to allow us to start all the Markov chains and 

# Simulate them as long as we wish. 

list Gnu=muval,tau=tauval, psi=psival,fstat=fs,nburn=nburn,start=start, 
n=n, ybar=ybar , w=w, hyp=hyp ,nchain=nchain, startvec=startvec) 


y 
A similar, but simpler, function will simulate a single chain after we have finished burn-in: 


hierchain=function(nsim,mu,tau,psi,n,ybar,w,hyp){ 


# 

# Run a Markov chain for "nsim" simulations from initial values "mu", 
# "tau", "psi"; the data are "n", "ybar", "w"; the prior 

# hyperparameters are "hyp". 

# 


# ngroup is the number of groups 
ngroup=length(ybar) 
# Set up matrices to hold the simulated parameter values 
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psiex=rep(0,nsim) 

muex=matrix(0,nsim,ngroup) 

tauex=muex 

# Loop through the simulations 

for(i in 1:nsim){ 

# Loop through the coordinates 

for(j in 1:ngroup){ 

# Generate the next value of mu 
temp=mugen(j,tau,psi,n,ybar,w,hyp) 
mu[j]=temp 

# Store the value of mu 
muex[i, j]=temp 

# Do the same for tau 
temp=taugen(j,mu,psi,n,ybar,w,hyp) 
tau[j]=temp 
tauex Li, j]=temp 

i 

# Do the same for psi 

temp=psigen(mu,tau,n,ybar,w,hyp) 

psi=temp 

psiexli]=temp 

I 

# Return a list with useful information: The simulated values 

list (nu=muex, tau=tauex, psi=psiex) 


, 


Next, we have a function that will run several independent chains and put the results together. It calls 
the previous function once for each chain. 


stackchains=function(burn,nsim) { 


# 
# Starting from the information in "burn", obtained from "burnchain", 
# run "nsim" additional simulations for each chain and stack the 
# results on top of each other. The 
# results from chain i can be extracted by using rows 
# (i-1)*nsim to i*tnsim of each parameter matrix 
# 
# Set up storage for parameter values 
muex=NULL 
tauex=NULL 
psiex=NULL 


# Loop through the chains 
for(1 in 1:burn$nchain) { 
# Extract the last burn-in parameter value for chain 1 
mu=burn$mu[1, ] 
tau=burn$tau[1, ] 
psi=burn$psi [1] 
# Run the chain nsim times 
temp=hierchain(nsim,mu,tau,psi, burn$n, burn$ybar , burn$w, burn$hyp) 
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# Extract the simulated values from each chain and stack them. 
muex=rbind (muex, temp$mu) 

tauex=rbind (tauex, temp$tau) 

psiex=c(psiex,temp$psi) 
} 
# Return a list with useful information: the simulated values, the 
# number of simulations per chain, and the number of chains. 
list (nu=muex, tau=tauex, psi=psiex,nsim=nsim,nchain=burn$nchain) 


r 


The calculations done in Example 12.5.6 begin by applying the above functions and then manipulating 
the output. The following commands were typed at the R command prompt >. Notice that some of them 
produce output that appears in the same window in which the typing is done. The summary statistics appear 
in Table 12.4 in the text (after correcting the errors). 


> # Do the burn-in 
> hotdog. burn=burnchain(100,100,6,mu,tau,psi,n,ybar,w,hyp,2) 
> # Note that the F statistics are all less than 1+0.44m=45. 
> hotdog. burn$fstat 


Eel [,2] [,3] [,4] [,5] [,6] [,7] [,8] 
[1,] 0.799452 0.7756733 1.464278 1.831631 0.9673807 0.4030658 1.161147 2.727503 
[,9] 


[1,] 0.548359 

> # Now run each chain 10000 more times. 

> hotdog.mcmc=stackchains (hotdog. burn, 10000) 
> # Obtain the data for the summary table in the text 
> apply (hotdog.mcmc$mu, 2,mean) 

[1] 156.5894 158.2559 120.5360 159.5841 

> sqrt (apply Chotdog.mcmc$mu,2,var) ) 

[1] 4.893067 5.825234 5.552140 7.615332 

> apply (1/hotdog.mcmc$tau, 2,mean) 

[1] 495.6348 608.4955 542.8819 568.2482 

> sqrt (apply (1/hotdog.mcmc$tau,2,var) ) 

[1] 166.0203 221.1775 201.6250 307.3618 

> mean (hotdog.mcmc$psi) 

[1] 151.0273 

> sqrt (var (hotdog.mcmc$psi) ) 

[1] 11.16116 


Next, we source a file that computes values that we can use to assess how similar/different the four groups 
of hot dogs are. 


# Compute the six ratios of precisions (or variances) 

hotdog. ratio=cbind (hotdog.mcmc$tau[,1]/hotdog.mcmc$tau[,2], 
hotdog.mcmc$tauL,1]/hotdog.mcmc$taul,3] ,hotdog.mcmc$tau[,1]/hotdog.mcmc$tau[,4], 
hotdog .mcmc$tauL[, 2] /hotdog.mcmc$taul,3] ,hotdog.mcmc$tau[,2]/hotdog.mcmc$tau[,4], 
hotdog.mcmc$tau[,3]/hotdog.mcmc$tauL,4]) 

# For each simulation, find the maximum ratio. We need to include one over 

# each ratio also. 

hotdog. rmax=apply (cbind (hotdog.ratio,1/hotdog.ratio) ,1,max) 
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# Compute the six differences between means. 

hotdog.diff=cbind (hotdog.mcmc$mu[,1]-hotdog.mcmc$mul,2] , 

hotdog.mcmc$mu[, 2] -hotdog.mcmc$mu[,3] ,hotdog.mcmc$mu[,3]-hotdog.mcmc$mu[,4] , 
hotdog.mcmc$mu[,4]-hotdog.mcmc$mu[,1] ,hotdog.mcmc$mu[,1]-hotdog.mcmc$mu[,3] , 
hotdog.mcmc$mu[,2]-hotdog.mcmc$mu[, 4] ) 

# For each simulation, find the minimum, maximum and average absolute 

# # differences. 

hotdog.min=apply (abs (hotdog.diff) ,1,min) 
hotdog.max=apply (abs (hotdog. diff) ,1,max) 

hotdog. ave=apply (abs (hotdog.diff) ,1,mean) 


Using the results of the above calculations, we now type commands at the prompt that answer various 
questions. First, what proportion of the time is one of the ratios of standard deviations at least 1.5 (ratio 
of variances at least 2.25)? In this calculation the vector hotdog.max>2.25 has coordinates that are either 
TRUE (1) or FALSE (0) depending on whether the maximum ratio is greater than 2.25 or not. The mean is 
then the proportion of TRUEs. 


mean (hotdog. rmax>2.25) 
[1] 0.3982667 


Next, compute the 0.01 quantile of the maximum absolute difference between between the means, the 
median of the minimum difference, and the 0.01 quantile of the average difference. In 99% of the simulations, 
the difference was greater than the 0.01 quantile. 


> quantile (hotdog.max,0.01) 
17%, 

26.3452 

> median (hotdog.min) 

[1] 2.224152 

> quantile (hotdog.ave,0.01) 
1%, 

13.77761 


In Example 12.5.7, we needed to simulate a pair of observations (Yi, Y3) from each parameter vector and 
then approximate the 0.05 and 0.95 quantiles of the distribution of Y,; — Y3 for a prediction interval. The 
next function allows one to compute a general function of the parameters and find simulation standard errors 
using Eq. (12.5.1), that is S/k1/2. 


mcmcse=function(simobj ,func,entire=FALSE) { 


Start with the result of a simulation "simobj", compute a vector function 
"func" from each chain, and then compute formula (12.5.1) for each 
coordinate as well as the covariance matrix. If "entire" is TRUE, 
it also computes the function value on the entire parameter 
matrix. This may differ from the average over the chains if "func" 
is not and average and/or if it does additional simulation. 

Also computes the avearge of the "func" 

values. The function "func" must take as arguments matrices of mu, 
tau, and psi values with each row from a single simulation. It 
must return a real vector. For example, if you want two 

quantiles of the distribution of f(Yi,Yj) where Yi comes from group 


# HH HH HH HH H OH FH 
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i and Yj comes from group j, func should loop through the rows of 
its input and simulate a pair (Yi,Yj) for each parameter. Then it 
should return the appropriate sample quantiles of the simulated 
values of f(Yj,Yj). 


k is the number of chains, nsim the number of simulations 
k=simobj$nchain 
nsim=simobj$nsim 
# Loop through the chains 
for(i in 1:k){ 
# Extract the parameters for chain i 
mu=simobj$mu[((i-1)*nsim) : (i*nsim) ,] 
tau=simobj$tau[((i-1)*nsim) : (i*nsim) ,] 
psi=simobj$psi[((i-1)*nsim) : (itnsim) ] 
# Compute the function value based on the parameters of chain i 
if(i==1){ 
valf=func (mu, tau, psi) 
telsef 
valf=rbind(valf,func(mu,tau,psi) ) 
, 
} 
# p is how many functions were computed 
p=ncol (valf) 
# compute the average of each function 
ave=apply (valf ,2,mean) 
# compute formula (12.5.1) for each function 
se=sqrt (apply (valf ,2,var)*(k-1))/k 
# 
# Return the average function value, formula (12.5.1), and covariance 
# matrix. The covariance matrix can be useful if you want to 
it compute a further function of the output and then compute a 
it simulation standard error for that further function. Also computes 
# the function on the entire parameter set if "entire=TRUE". 
if (entire) { 
list (ave=ave,se=se, covmat=cov(valf)*(k-1)/k*2, 
entire=func(simobj$mu, simobj$tau,simobj$psi) ) 
telsef 
list (ave=ave,se=se, covmat=cov(valf) *(k-1) /k*2) 
} 
} 


The specific function func used in Example 12.5.7 is: 


hotdog13=function(mu,tau, psi) { 
# 
# Compute the 0.05 and 0.95 quantiles of predictive distribution of Y1-Y3 
# 
n=nrow (mu) 
# Make a place to store the differences. 
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vals=rep(0,n) 
# Loop through the parameter vectors. 
for ain tmnt 
# Simulate a difference. 
vals [i]=rnorm(1)/sqrt (tauli,1])+mu[i,1]-(rnorm(1)/ 
sqrt (tauli,3])+mu[i,3]) 
Me 
# Return the desired quantiles 
quantile(vals,c(0.05,0.95)) 
} 


Finally, we use the above functions to compute the prediction interval in Example 12.5.7 along with the 
simulation standard errors. 


> hotdog. pred=mcmcse (hotdog.mcmc, hotdog13,T) 
> hotdog.pred 
$ave 
5h 95% 
-18.57540 90.20092 


$se 
5h 95% 
0.2228034 0.4345629 


$covmat 

5h 95% 
5% 0.04964136 0.07727458 
95% 0.07727458 0.18884493 


$entire 
5% 95%, 
-18.49283 90.62661 


The final line hotdog. pred$entire gives the prediction interval based on the entire collection of 60,000 sim- 
ulations. The one listed as hotdog. pred$ave is the average of the six intervals based on the six independent 
Markov chains. There is not much difference between them. The simulation standard errors show up as 
hotdog. pred$se. Remember that the numbers in the first printing don’t match these because of the error 
mentioned earlier. 
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